지도학습,supervised_learning
비지도학습,unsupervised_learning
강화학습,reinforcement_learning ->

강화학습,reinforcement_learning

강화_학습

분류,classification ->

분류,classification
분류기,classifier
이진분류,binary_classification
이진분류기,binary_classifier

[edit]

1. 학습의 목표 ¶

출력변수 $\displaystyle Y$ 와 입력변수 $\displaystyle X=(X_1,X_2,\cdots,X_p)$ 에 대해, 둘 사이의 관계를 표현하면

$\displaystyle Y=f(X)+\epsilon$

함수 $\displaystyle f$ 를 주어진 관측데이터^{observed_data?}로부터 추정,estimation하는 것이 학습,learning의 목표.
(ㄷㅎㅈ 1-2)

즉 함수를 만드는 것.
구체적인 과정은 가중값,weight의 업데이트,update에 가깝다? using 계산그래프,computational_graph 역전파,backpropagation etc...

[edit]

2. 기계학습의 분류 ¶

지도학습,supervised_learning

data에 Y가 주어져 있어, Y를 잘 예측,prediction/설명,explanation하는 $\displaystyle f$ 를 찾는 문제,problem
본 강좌 및 실무에서 주로 다룰 문제

비지도학습,unsupervised_learning

Y가 명시되지 않았으나 관심이 없고 X의 패턴,pattern 자체에 관심이 있음
군집분석,cluster_analysis(clustering_analysis ?) {

cluster_analysis o
clustering_analysis x 2023-10-05
cluster analysis clustering analysis
Sub: 분류 기반 군집분석 - 이건 비지도학습,unsupervised_learning보다는 semi-supervised_learning으로 분류됨

군집분석
}, // 군집분석
차원축소,dimensionality_reduction(dimension_reduction ?) {
dimensionality_reduction dimension_reduction
dimensionality reduction dimension reduction

차원축소
} // 차원축소
등등

semi-supervised learning (self-supervised)

발음 세마이-

지도학습과 비지도학습 양쪽 성격을 모두 갖고 있음
unlabeled_data의 활용, 분류 기반 군집분석 등
semi-supervised_learning
semi-supervised learning
self-supervised learning
지도학습과 비지도학습의 중간 정도. 하지만 중요한 분류 세개(지도와 비지도와 강화)만큼 중요한 분류는 아님 (ㅅㅈㅎ)

강화학습,reinforcement_learning

주어진 환경에서 최대의 보상(보상,reward, 보상,compensation보다는.)을 달성하기 위한 정책,policy의 학습
임의 추출된 데이터가 아닌 전략적으로 선택된 데이터를 이용해 학습 // 전략,strategy 선택,selection?
이건 $\displaystyle X\overset{f}{\to}Y$ 에서 $\displaystyle f$ 보다는 $\displaystyle X$ 를 찾는게 목적이다. (ㅅㅈㅎ)

암튼 이렇게 '크게 세가지'로 나눌 수 있다는 것.

(ㄷㅎㅈ 1-2 32m)

[edit]

3. 기계학습의 절차 ¶

1. 문제 설정
종속변수,dependent_variable Y가 무엇인가?

2. 데이터 수집 // data_collection ??

(기존 분석 방식) (초기 데이터가 없다. 그래서) 실험,experiment을 설계하고 수행하여 데이터 수집.
(빅데이터분석) 이미 존재하는 DB에서 관련된 모든 데이터를 수집.

3.
탐색적 데이터 분석 (EDA, exploratory data analysis) // exploratory_data_analysis

exploratory_data_analysis ?

exploratory_data_analysis ?
//

탐색적 데이터 분석

데이터에 대해 배우는 과정
시각화,visualization, 결측치 { 결측치 }, 이상치 탐색 { 이상치 탐색 이상치 탐색 } 등을 포함
데이터 전처리preprocessing 과정도 여기 포함.

본격적 데이터 분석 (예측 모델) // 예측모델 prediction_model ?

prediction_model ?

클린 데이터(clean_data ?? clean_data 클린 데이터 클린 데이터 "클린 데이터" )로부터 시작 (n=100M, p=10k)
트레이닝 셋^{training_set ... 훈련할 것}과 테스트 셋^{test_set 평가할 것}을 분리 (보통 시간순서^{chronological_order ? chronological order chronological order chronological order chronological order 시간,time 순서,order}에 따라)
불필요한 종속변수 제거 (feature selection) // 특징선택,feature_selection feature_selection ? feature_selection ? feature_selection ? ... 특징,feature 특성,feature 선택,selection이것이 불필요한 종속변수,dependent_variable을 제거하는 과정과 정확히? TBW

// 불필요한 종속변수를 제거 feature selection
학습 모델 후보 선정 (EDA에 따라 4~5개 정도 후보 선정)
(교차)검증 기법을 이용하여 모델 선정

// 교차에 괄호쳐놓은거보니... 교차검증 말고 다른 방법은 뭐뭐? 검증,validation 머신러닝 검증 방법
// i.e. model_selection using 교차검증,cross_validation ? =교차검증,cross_validation =,cross_validation . cross_validation cross_validation ? cross_validation ? cross_validation ? 교차검증
테스트셋^test_set을 이용하여 최종 성능 평가

(이후) 완전히 새로운 데이터셋^{data_set / dataset}으로 다시 평가 (필드테스트^field_test)

(ㄷㅎㅈ 1-2 35m-40m)

[edit]

3.1. CRISP-DM ¶

이것들을 business적 측면에서 구체화시킨것을

CRISP-DM이라 한다.
CRISP-DM (Cross-industry standard process for data mining) : 데이터 마이닝^data_mining을 위한 일반적인 절차,procedure^{과정,process?}에 대한 표준,standard

(다이어그램에선 다음 항목들의 state_transition_diagram으로 나타나 있음)

business understanding
data understanding
data preparation - 여기서 EDA 수행하기도 함
modeling
evaluation
deployment

[edit]

4. ROC ¶

ROC곡선,ROC_curve ->

ROC곡선,ROC_curve
ROC(Receiver Operating Characteristic) curve

ROC, TPR, FPR ... via https://angeloyeo.github.io/2020/08/05/ROC.html
and https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc?hl=ko
{
비율,rate
TPR : true positive rate - 참양성률, 재현율
FPR : false positive rate - 거짓양성률

AUC : area under the curve
ROC 곡선 아래 영역

ex. 암 판정하는 의사
threshold를 어떻게 잡는지에 따라,
아무나 암으로 판정: 암이 아닌데도 암으로 판정하는 경우가 늘어남
암 판정에 지나치게 신중: 암인데도 암이 아니라고 판정하는 경우가 늘어남

ROC_curve 가 왼쪽 위에 붙어 있을수록 좋은 이진분류기.

}

[edit]

5. AUC ¶

AUC area under the curve
곡선 아래 면적

rel. ROC곡선,ROC_curve

https://wikidocs.net/151503

[edit]

6. perceptron ¶

퍼셉트론,perceptron
{

퍼셉트론,perceptron

다층퍼셉트론,multi-layer_perceptron,MLP
{
$\displaystyle \mathbf{o}=\tau\left( \mathbf{W}^L \sigma \left( \cdots \mathbf{W}^3 \sigma \left( \mathbf{W}^2 \sigma(\mathbf{W}^1 \mathbf{x} ) \right) \right) \right)$

$\displaystyle \sigma$ 는 sigmoid 같고
$\displaystyle \tau$ 는?? threshold???
}

}

[edit]

7. ground truth ¶

ground_truth =,ground_truth =,ground_truth . ground_truth
{

ground_truth

이하새로작성

'ground truth'...? 보이는 번역들:
실측 자료 - 이 단어는 기상학의 실측자료 에서 유래한 듯..?
...

ground truth

실제로 참,truth인 것은 아니고,

ideal expected result
desired output

label과 다른 점:
label은 명확하며 값이 정해져 있는 정답인데
ground truth는 모델이 원하는 답으로

MKLINK
학습,learning
기계학습,machine_learning

Sources:
https://wikidocs.net/169014
}

[edit]

8. medoid ¶

medoid =,medoid =,medoid . medoid
~~medoid x 2023-08-26~~
보이는 번역들:
중간점

클러스터분석 cluster_analysis 에서 언급됨

k-medoid clustering etc.

https://en.wikipedia.org/wiki/K-medoids

https://en.wikipedia.org/wiki/Medoid

medoid = https://www.bing.com/search?q=medoid

medoid

[edit]

9. 전이학습? transfer learning ¶

transfer_learning

re-use weights
가중값,weight을 재사용?

[edit]

10. 앙상블학습 ensemble learning ¶

앙상블학습

앙상블 모델 model[1]
앙상블 방법 method (we)
등도 많이 보이는 표현인데.... 차이점?

https://ko.wikipedia.org/wiki/앙상블_학습법

"앙상블 학습법(영어: ensemble learning method)"

https://en.wikipedia.org/wiki/Ensemble_learning

rel?

Ensemble_averaging_(machine_learning)
= https://en.wikipedia.org/wiki/Ensemble_averaging_(machine_learning)
= https://en.wikipedia.org/wiki/Ensemble_averaging_(machine_learning)

[edit]

11. 학습율 학습률 learning rate ¶

학습율,learning_rate
학습률,learning_rate

일단

학습율,learning_rate 있는데

학습율 학습률

학습율 학습률 "학습율 학습률" ... rename?

learning_rate ?

Learning_rate ?

//"learning rate" ...

learning rate

[edit]

12. 기계학습|머신러닝 기술들|방법론들|접근법 ... from DLwJS ¶

from
Deep Learning with JavaScript: Neural networks in TensorFlow.js
2020

Deep Learning with JavaScript: Neural networks in TensorFlow.js

[edit]

12.1. naive Bayes classifier ¶

naive_Bayes_classifier
naive_Bayes_classifier
rel naive_Bayes_classification ?

naive Bayes classifier
나이브|순진한 베이즈 분류기

베이즈_정리,Bayes_theorem는
① 사건이 일어날 사전확률(믿음)과
② 사건에 관련된 관찰된 사실(특징,feature 특성,feature)
이 주어졌을 때 이 사건의 확률을 추정,estimation하는 방법이다.
이 정리는 관찰된 사실이 주어졌을 때, 알려진 여러 category 중 가장 높은 확률(가능성)을 가진 category로 관찰된 data_point를 분류하는 데 쓸 수 있다.
나이브 베이즈 분류기는 관찰된 사실(들이 모두?) 상호독립 (pairwise 독립성,independence? chk 상호독립 상호독립 )적이라는 가정,assumption을 기반으로 한다. (그래서 이름이 순진한_naive)

// naive Bayes classifier ....

naive Bayes classifier

[edit]

12.2. logistic regression ¶

로지스틱회귀,logistic_regression
분류,classification 문제에 대해 data scientist가 첫 번째로 시도해 보는 algorithm.

[edit]

12.3. kernel method ¶

커널방법 ?? 커널방법,kernel_method?
방법,method

이진분류(이진,binary 분류,classification, 이진분류,binary_classification) 문제를 다루는 방법.
원본 데이터를 고차원공간으로 mapping하여
두 class 사이의 거리(마진,margin)를 최대화하는 변환,transformation을 찾는다.
가장 잘 알려진 예는 support_vector_machine(SVM).

// kernel method ....

kernel method

kernel method
커널,kernel

[edit]

12.4. decision tree ¶

flowchart 같은 구조.
입력 데이터 포인트를 분류하거나,
주어진 입력으로 출력 값을 예측한다.
flowchar 각 단계마다 '특징,feature X가 어떤 임계값,critical_value보다 큰가?" 같은 간단한 예/아니오 질문에 답을 한다. 그리고 대답에 따라 또 다른 예/아니오 질문 중 두 개를 선택한다. flowchart 깥에 도달하면 최종 답을 얻는다.
사람이 이해하고 시각화하기 쉽다.

결정트리,decision_tree

[edit]