Machine Learning (1)
今天复习一下上学时学习的机器学习的概念。
一些机器学习的应用场景
- 手写识别
- 物体识别
- 情感识别(I love this course 是积极的,I will take another course 是消极的)
- 遥感信息图像分类(通过俯视图查看区域种类)
- 网页搜索
- 语音识别
- 垃圾邮件识别
- 机器人
- 医疗健康
- 指纹识别
- 面部识别
- 自动驾驶
- 卫星图树种类分类识别
一些免费的数据集
- Image/Video Databases (comprehensive): https://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
- http://archive.ics.uci.edu/
- handwritten digits: http://yann.lecun.com/exdb/mnist/
- https://image-net.org/
一些术语
- 分类
- 回归
- 聚类
- 标签 Label
- 特征 Features
- 损失函数 Loss Function
- 消耗函数 Cost Function
- 准确度 Accuracy
- 单个数据 Example / Sample
- 训练集 Labeled Sample
- 测试集 Unlabeled Sample
- 模型 Model
- 算法 Algorithms
一些例子
分类
物体分类与识别
训练集和测试集机器学习的分类
监督学习 —— Labelled data
无监督学习 —— Labelled & Unlabeled Data
强化学习 —— Reward System
监督学习
● Supervised Learning - Classification
○ Features: image/pixels
○ Label: Cat and Dog
● Task: Given a picture of car or dog, predict its label.
● Supervised Learning - Regression
○ Features: size of house
○ Label: house price
● Task: Given a house’s size, predict the selling price.
Recap:
● A model trained on historical data that are
labeled or known ground truth (e.g.
previous house sales example).
● Once the model is trained, it can then be
tested on new data to predict the label.
无监督学习
● What if No ground truth available?
● Unsupervised Learning - Clustering
○ Features: Length and Width for types
of flowers.
○ Label: No Label for unsupervised!
● Task: Cluster together the data into
similar groups/patterns.
强化学习
Reinforcement learning works through trial and error which actions yield the greatest
rewards.
深度学习
A family of machine learning methods that uses deep
architectures to learn high-level feature representations.
Parametric and non- Parametric models
• Parametric models:
• Have a fixed number of parameters
• Faster to use, simpler
• Stronger assumptions on the data distribution
• e.g., Logistic regression
• Non- Parametric Models:
• The number of parameters grow with the amount of
training data
• Flexible on data distribution
• Slower, risk of overfitting, more data requirement
• e.g., K-nearest neighbors (KNN)
监督学习工作流程
那么如何评价一个模型呢?
Supervised Learning —— Classification
Evaluation Metrics
Accuracy , Recall, Precision, Confusion matrix
Accuracy: Correctly Classified divided by
total samples.
Confusion Matrix: Shows the actual and predicted labels from a
classification problem.
Recall: the proportion of actual positives was
identified correctly.
Precision: the proportion of positive identifications
was actually correct.