MKLINK OR MOVE.
[[역전파,backpropagation]]
[[계산그래프,computational_graph]]
[[인공지능,artificial_intelligence]]
차원축소 or 차원감소, tbd. [[dimensionality_reduction]]
[[엠니스트,MNIST]]
[[활성화함수,activation_function]]
 [[시그모이드함수,sigmoid_function]] - [[VG:시그모이드함수,sigmoid_function]]
 [[ReLU함수,ReLU_function]]
[[가중값,weight]]
 [[가중값초기화,weight_initialization]] - aka neural_network_initialization ?
  aka 가중치 초기화, 초기 가중치 설정
  [[zero_initialization]]였던가? - bad.
   [[Date(2023-09-28T01:40:53)]] 단점이 뭐였더라 또까먹음 ... Ggl:신경망+zero+초기화+단점 Naver:신경망+zero+초기화+단점
  [[He_initialization]]
  [[Xavier_initialization]]
  [[LeCun_initialization]]
  ...
  tmp bmks ko
  https://reniew.github.io/13/
  Up: [[가중값,weight]] [[초기화,initialization]] [[인공신경망,artificial_neural_network,ANN]]

----
Sub:
CNN [[convolutional_neural_network]]
RNN [[recurrent_neural_network]]
{
sequential data 다룸
그래서
speech recognition, machine translation ... 에 활용

RNNs are just neural networks that:
share weights across multiple layers,
take an input at each layer,
and have a variable number of layers

state_machine
internal_state

RNNs can be considered as layered, feed-forward networks with shared weights. // feedforward_network

Backpropagation Through Time (BPTT)

Truncated BPTT
When a sequence is large, unrolling RNNs is both computationally and memory prohibitive
Backpropagated gradients are truncated after K steps
– Carry hidden states forward in time forever

vanishing gradient : 시간이 지날수록 0으로
exploding gradient : 시간이 지날수록 폭발적으로 커지는
Very hard to capture long-term dependencies

gradient clipping : Scaling down the gradients
– Rescale norm of the gradients as it goes over a threshold (𝜂)

Resolving Vanishing Gradients // vanishing_gradient gradient_vanishing [[기울기소실,gradient_vanishing]] 해결책
• A few approaches
 – Proper initialization of the weight matrices
 – Proper activation functions such as ReLU
  • Derivative of ReLU is either 0 or 1
– Adopt gating mechanisms! // [[gating_mechanism]] ... Google:gating_mechanism
 • Long short-term memory (LSTM) // [[long_short-term_memory]]
 • Gated recurrent unit (GRU) // [[gated_recurrent_unit]]

장단기기억 LSTM
{
장단기 기억 ?
장단기 메모리 ?

RNN의 long-term dependency(장기 의존성)문제를 해결하기 위한 모델.

장단기 기억((딥 러닝 순환신경망의 장기 의존성(앞에서 수집한 자료가 뒤로 가면 사라져 신경망의 성능이 저하되는 것) 문제를 보완하기 위해, 앞부분의 정보를 오랫동안 기억하도록 설계한 정보 처리 모델))[* https://en.dict.naver.com/#/entry/enko/d7af4546c202458bb9eda39fa5dbabe4]

vanishing gradient problem ([[VG:기울기소실,gradient_vanishing]])을 해결하는.

input gate 입력 게이트
forget gate 망각 게이트? 삭제 게이트?
output gate 출력 게이트
cell state 

WpKo:장단기_메모리
WpEn:Long_short-term_memory
[[기억,memory]] [[메모리,memory]]
}

GRU [[gated_recurrent_unit]]
{
RNN의 gating mechanism 중 하나.
기존 RNN의 장기의존성 문제 해결을 위해 고안.

LSTM보다 단순한 구조.
input gate와 forget gate가 결합된 update gate를 사용하여 정보 흐름을 제어.

WpKo:게이트_순환_유닛
WpEn:Gated_recurrent_unit

}

WpKo:순환_신경망
WpEn:Recurrent_neural_network
}
DBN [[deep_belief_network]] { 2006년 발표? 외면받던 ANN을 다시 주류 AI 방법론으로 끌어올린 ? ... rel. [[심층학습,deep_learning]] ? }
DNN [[deep_neural_network]] { rel. [[심층학습,deep_learning]]? }
FNN [[feedforward_neural_network]]
{
WpKo:순방향_신경망
WpEn:Feedforward_neural_network
}
FCN - [[fully_convolutional_network]] ... Google:fully_convolutional_network
{
encoder / down-sampling contraction path
decoder / up-sampling expansion path
}

...

CNN - [[이미지처리,image_processing]]에 주로?
RNN - 

이하 주로 이미지처리? CNN? 모두 [[심층학습,deep_learning]] 분류?

<<tableofcontents>>

= LeNet =
[[LeNet]]
{
르넷?
Yann_LeCun 이름에서?
}
= GoogLeNet =
[[GoogLeNet]]
{
inception module?
}
= VGGNet =
[[VGGNet]]
{
pagename?
VGG넷 ?
Oxford의 VGG(Visual Geometry Group)에서 만든.
2014년 ILSVRC 2위.
16 layers.
}
= AlexNet =
[[AlexNet]]
{
8 layers.
}
= ResNet =
[[ResNet]]
{
레스넷?
[[잔차학습,residual_learning]] - [[잔차,residual]] [[VG:잔차,residual]]

F(x)+x=H(x)
F(x)=H(x)-x //// <- Residual

CNN은 H(x)를 얻는 게 목적인데,
'''ResNet'''은 F(x)+x를 [[최소화,minimization]]하는 것을 목적으로 한다.
x는 변할 수 없으므로, F(x)를 최소화하는 것이 목적이다. (즉 residual을 최소화하는 것이 목적이다.)

F(x) : weight layer를 통과한 값
identity mapping = skip connection = identity shortcut connection : x : weight layer를 통과하지 않은(바로 skip한) 값, input과 output이 같은 차원
residual mapping : 위 둘의 합, 즉 F(x) + x
residual block : 단위 구조?
residual network - residual block들이 쌓인 것, 즉 ResNet


tmp bmks ko
(ResNet)Deep Residual Learning for Image Recognition 논문 리뷰
https://tobigs.gitbook.io/tobigs/deep-learning/computer-vision/resnet-deep-residual-learning-for-image-recognition


... Google:ResNet Naver:ResNet
}
= DenseNet =
dense block
... Google:DenseNet Naver:DenseNet
= MobileNet =
Built based upon depth-wise separable convolution
• Depth-wise convolution & point-wise convolution
= EfficientNet =
scaling을 더 flexible하게?
depth width resolution 이 셋을 balance?

----
Misc

각종 PascalCase로 이름지은 뭐뭐뭐Net은 [[언어학,linguistics]] (esp [[전산언어학,computational_linguistics]] [[말뭉치언어학,corpus_linguistics]]?? qqq ) 에서도 자주 사용하는 이름짓기 방법.

----
Twins:
[[Namu:인공신경망]]

Up:
[[신경망,neural_network]]
[[인공지능,artificial_intelligence]]

QQQ. ANN은 아주 크고 복잡한 [[regression_model]]인가? 일반화?