지도학습,supervised_learning
비지도학습,unsupervised_learning
강화학습,reinforcement_learning ->

강화학습,reinforcement_learning

강화_학습

분류,classification ->

분류,classification
분류기,classifier
이진분류,binary_classification
이진분류기,binary_classifier

1. 학습의 목표

2. 기계학습의 분류

3. 기계학습의 절차

3.1. CRISP-DM

9. 전이학습? transfer learning

10. 앙상블학습 ensemble learning

11. 학습율 학습률 learning rate

12. tmp video ko

12.1. Docceptor 머신러닝

[edit]

1. 학습의 목표 ¶

출력변수 $\displaystyle Y$ 와 입력변수 $\displaystyle X=(X_1,X_2,\cdots,X_p)$ 에 대해, 둘 사이의 관계를 표현하면

$\displaystyle Y=f(X)+\epsilon$

함수 $\displaystyle f$ 를 주어진 관측데이터^{observed_data?}로부터 추정,estimation하는 것이 학습,learning의 목표.
(ㄷㅎㅈ 1-2)

즉 함수를 만드는 것.
구체적인 과정은 가중값,weight의 업데이트,update에 가깝다? using 계산그래프,computational_graph 역전파,backpropagation etc...

[edit]

2. 기계학습의 분류 ¶

지도학습,supervised_learning

data에 Y가 주어져 있어, Y를 잘 예측,prediction/설명,explanation하는 $\displaystyle f$ 를 찾는 문제,problem
본 강좌 및 실무에서 주로 다룰 문제

비지도학습,unsupervised_learning

Y가 명시되지 않았으나 관심이 없고 X의 패턴,pattern 자체에 관심이 있음
군집분석,cluster_analysis(clustering_analysis ?) {

cluster_analysis o
clustering_analysis x 2023-10-05
cluster analysis clustering analysis
Sub: 분류 기반 군집분석 - 이건 비지도학습,unsupervised_learning보다는 semi-supervised_learning으로 분류됨

군집분석
}, // 군집분석
차원축소,dimensionality_reduction(dimension_reduction ?) {
dimensionality_reduction dimension_reduction
dimensionality reduction dimension reduction

차원축소
} // 차원축소
등등

semi-supervised learning (self-supervised)

발음 세마이-

지도학습과 비지도학습 양쪽 성격을 모두 갖고 있음
unlabeled_data의 활용, 분류 기반 군집분석 등
semi-supervised_learning
semi-supervised learning
self-supervised learning
지도학습과 비지도학습의 중간 정도. 하지만 중요한 분류 세개(지도와 비지도와 강화)만큼 중요한 분류는 아님 (ㅅㅈㅎ)

강화학습,reinforcement_learning

주어진 환경에서 최대의 보상(보상,reward, 보상,compensation보다는.)을 달성하기 위한 정책,policy의 학습
임의 추출된 데이터가 아닌 전략적으로 선택된 데이터를 이용해 학습 // 전략,strategy 선택,selection?
이건 $\displaystyle X\overset{f}{\to}Y$ 에서 $\displaystyle f$ 보다는 $\displaystyle X$ 를 찾는게 목적이다. (ㅅㅈㅎ)

암튼 이렇게 '크게 세가지'로 나눌 수 있다는 것.

(ㄷㅎㅈ 1-2 32m)

[edit]

3. 기계학습의 절차 ¶

1. 문제 설정
종속변수,dependent_variable Y가 무엇인가?

2. 데이터 수집 // data_collection ??

(기존 분석 방식) (초기 데이터가 없다. 그래서) 실험,experiment을 설계하고 수행하여 데이터 수집.
(빅데이터분석) 이미 존재하는 DB에서 관련된 모든 데이터를 수집.

3.
탐색적 데이터 분석 (EDA, exploratory data analysis) // exploratory_data_analysis

exploratory_data_analysis ?

exploratory_data_analysis ?
//

탐색적 데이터 분석

데이터에 대해 배우는 과정
시각화,visualization, 결측치 { 결측치 }, 이상치 탐색 { 이상치 탐색 이상치 탐색 } 등을 포함
데이터 전처리preprocessing 과정도 여기 포함.

본격적 데이터 분석 (예측 모델) // 예측모델 prediction_model ?

prediction_model ?

클린 데이터(clean_data ?? clean_data 클린 데이터 클린 데이터 "클린 데이터" )로부터 시작 (n=100M, p=10k)
트레이닝 셋^{training_set ... 훈련할 것}과 테스트 셋^{test_set 평가할 것}을 분리 (보통 시간순서^{chronological_order ? chronological order chronological order chronological order chronological order 시간,time 순서,order}에 따라)
불필요한 종속변수 제거 (feature selection) // 특징선택,feature_selection feature_selection ? feature_selection ? feature_selection ? ... 특징,feature 특성,feature 선택,selection이것이 불필요한 종속변수,dependent_variable을 제거하는 과정과 정확히? TBW

// 불필요한 종속변수를 제거 feature selection
학습 모델 후보 선정 (EDA에 따라 4~5개 정도 후보 선정)
(교차)검증 기법을 이용하여 모델 선정

// 교차에 괄호쳐놓은거보니... 교차검증 말고 다른 방법은 뭐뭐? 검증,validation 머신러닝 검증 방법
// i.e. model_selection using 교차검증,cross_validation ? =교차검증,cross_validation =,cross_validation . cross_validation cross_validation ? cross_validation ? cross_validation ? 교차검증
테스트셋^test_set을 이용하여 최종 성능 평가

(이후) 완전히 새로운 데이터셋^{data_set / dataset}으로 다시 평가 (필드테스트^field_test)

(ㄷㅎㅈ 1-2 35m-40m)

[edit]

3.1. CRISP-DM ¶

이것들을 business적 측면에서 구체화시킨것을

CRISP-DM이라 한다.
CRISP-DM (Cross-industry standard process for data mining) : 데이터 마이닝^data_mining을 위한 일반적인 절차,procedure^{과정,process?}에 대한 표준,standard

(다이어그램에선 다음 항목들의 state_transition_diagram으로 나타나 있음)

business understanding
data understanding
data preparation - 여기서 EDA 수행하기도 함
modeling
evaluation
deployment

[edit]

4. ROC ¶

ROC곡선,ROC_curve ->

ROC곡선,ROC_curve
ROC(Receiver Operating Characteristic) curve

ROC, TPR, FPR ... via https://angeloyeo.github.io/2020/08/05/ROC.html
and https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc?hl=ko
{
비율,rate
TPR : true positive rate - 참양성률, 재현율
FPR : false positive rate - 거짓양성률

AUC : area under the curve
ROC 곡선 아래 영역

ex. 암 판정하는 의사
threshold를 어떻게 잡는지에 따라,
아무나 암으로 판정: 암이 아닌데도 암으로 판정하는 경우가 늘어남
암 판정에 지나치게 신중: 암인데도 암이 아니라고 판정하는 경우가 늘어남

ROC_curve 가 왼쪽 위에 붙어 있을수록 좋은 이진분류기.

}

[edit]

5. AUC ¶

AUC area under the curve
곡선 아래 면적

rel. ROC곡선,ROC_curve

https://wikidocs.net/151503

[edit]

6. perceptron ¶

퍼셉트론,perceptron
{

퍼셉트론,perceptron

다층퍼셉트론,multi-layer_perceptron,MLP
{
$\displaystyle \mathbf{o}=\tau\left( \mathbf{W}^L \sigma \left( \cdots \mathbf{W}^3 \sigma \left( \mathbf{W}^2 \sigma(\mathbf{W}^1 \mathbf{x} ) \right) \right) \right)$

$\displaystyle \sigma$ 는 sigmoid 같고
$\displaystyle \tau$ 는?? threshold???
}

}

[edit]