Machine Learning (1)

今天复习一下上学时学习的机器学习的概念。

一些机器学习的应用场景
1. 手写识别
2. 物体识别
3. 情感识别（I love this course 是积极的，I will take another course 是消极的）
4. 遥感信息图像分类（通过俯视图查看区域种类）
5. 网页搜索
6. 语音识别
7. 垃圾邮件识别
8. 机器人
9. 医疗健康
10. 指纹识别
11. 面部识别
12. 自动驾驶
13. 卫星图树种类分类识别
一些免费的数据集
1. Image/Video Databases (comprehensive): https://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
2. http://archive.ics.uci.edu/
3. handwritten digits: http://yann.lecun.com/exdb/mnist/
4. https://image-net.org/
一些术语
1. 分类
2. 回归
3. 聚类
4. 标签 Label
5. 特征 Features
6. 损失函数 Loss Function
7. 消耗函数 Cost Function
8. 准确度 Accuracy
9. 单个数据 Example / Sample
10. 训练集 Labeled Sample
11. 测试集 Unlabeled Sample
12. 模型 Model
13. 算法 Algorithms
一些例子

分类

物体分类与识别

训练集和测试集
机器学习的分类
监督学习 —— Labelled data
无监督学习　——　Labelled & Unlabeled Data
强化学习　——　Reward　System

监督学习
● Supervised Learning - Classification
○ Features: image/pixels
○ Label: Cat and Dog
● Task: Given a picture of car or dog, predict its label.

● Supervised Learning - Regression
○ Features: size of house
○ Label: house price
● Task: Given a house’s size, predict the selling price.

Recap:
● A model trained on historical data that are
labeled or known ground truth (e.g.
previous house sales example).
● Once the model is trained, it can then be
tested on new data to predict the label.

无监督学习
● What if No ground truth available?
● Unsupervised Learning - Clustering
○ Features: Length and Width for types
of flowers.
○ Label: No Label for unsupervised!
● Task: Cluster together the data into
similar groups/patterns.

2023-09-05T100919

强化学习
Reinforcement learning works through trial and error which actions yield the greatest
rewards.
2023-09-05T101140

深度学习
A family of machine learning methods that uses deep
architectures to learn high-level feature representations.

Parametric and non- Parametric models
• Parametric models:
• Have a fixed number of parameters
• Faster to use, simpler
• Stronger assumptions on the data distribution
• e.g., Logistic regression

• Non- Parametric Models:
• The number of parameters grow with the amount of
training data
• Flexible on data distribution
• Slower, risk of overfitting, more data requirement
• e.g., K-nearest neighbors (KNN)

监督学习工作流程
2023-09-05T111605