Batch Size와 모델 성능의 상관관계

Batch Normalization

위 수식을 자세히 알 필요는 없고, 미니배치의 평균, 분산을 이용하여 값을 구한다는 것만 알면 된다.

관심이 간다면 아래 Reference에 링크해뒀으니 논문을 읽어봐도 좋을 것이다.

그래서 결론은 다음과 같다.

결국 Batch Normalization(이하 배치 정규화)는 미니배치의 평균, 분산에 영향을 받는다.
평균, 분산은 미니배치의 크기에 따라 달라지므로 Batch Size에 따라 미니배치의 평균, 분산이 변한다는 것.
따라서 Batch Size는 배치 정규화에 영향을 미친다.
즉, Batch Size에 따라 모델의 성능이 달라질 수 있다!

Batch Size 정리

Batch Size가 커질수록..	Batch Size가 작아질수록..
노이즈 ↓	노이즈 ↑
일반화 성능 ↓	일반화 성능 ↑
이상치에 둔감하게 반응	이상치에 민감하게 반응

Tips.

Task 별 일반적으로 사용하는 Batch size는 아래와 같다.

Classification: 32, 64, 128... (그 이상)
Object Detection & Segmentation: 2~8까지의 작은 값 사용
NLP: 대규모 모델인 경우 32 이상, 작은 모델이나 작은 데이터셋인 경우 4~16
GAN: 2~16

출처: ChatGPT이므로 참고 정도로만 보자.

(참고로 ⌜Rethinking "Batch" in BatchNorm⌟ 논문(Refence 참조) 에서는 32~128을 추천함)

batch size를 2의 거듭제곱으로 설정하는 이유

GPU의 메모리가 2의 거듭제곱이다.
따라서, 메모리 할당 및 관리가 쉽다. (메모리를 균일한 크기로 나누기 쉬움)
결국 GPU 메모리를 효율적으로 사용할 수 있다는 의미임

Reference

Batch Nomalization
https://arxiv.org/abs/1502.03167

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful param

arxiv.org

Batch Size
https://arxiv.org/abs/2105.07576

Rethinking "Batch" in BatchNorm

BatchNorm is a critical building block in modern convolutional neural networks. Its unique property of operating on "batches" instead of individual samples introduces significantly different behaviors from most other operations in deep learning. As a resul