Convolution Neural Network

CNN은 데이터를 3차원으로 모델에 집어 넣습니다. Convolution은 이미지 특징을 찾아냅니다.

python: conv-forword
tensorflow: tf.nn.conv2d
keras: Conv2D

이미지뿐만 아니라 nlp, 신호처리에도 쓰였습니다. 이 때는 Conv1D로 1차원으로 데이터를 집어넣습니다.

데이터 로드

CNN 예시를 사용할 예시 데이터 mnist를 불러오도록 하겠습니다.

import tensorflow as tf
from tensorflow.keras import datasets, layers, models

(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

# cnn 은 depth를 하나 더 만들어줘서 3차원으로 만든다. 
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

# 픽셀 값을 0~1 사이로 정규화합니다.
train_images, test_images = train_images / 255.0, test_images / 255.0

모델 정의

Conv2D

Conv2D는 32개, 64개의 특징을 찾아낸다는 말입니다.

Maxpooling

Maxpooling은 특성을 줄이는 기법입니다. 풀링을 하면 속도도 빨라지고, overfitting도 방지해줍니다. 하지만 요즘에는 잘 안씁니다. Maxpooling 대신 Strdie로 대체시켜서 Stride를 많이 사용합니다.

Padding

유효 합성곱(Valid convolutions): 패딩이 없는것 (stride 때문에 못갈 경우 멈추도록 도와주는 것)
동일 합성곱(Same convolutions): 패딩을 한 뒤 결과 이미지의 크기가 기존 이미지와 동일 (strdie 때문에 못갈경우 뒤에 패딩을 추가) (ex.p = (f-1)/2)

model = models.Sequential()
# 3차원임으로 input_shape=(28,28,1)이다. 
# 32개의 특성을 (3,3) 커널 형태
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
# 가장 큰 값 뽑는 것 (특성을 줄였다.)
# maxpooling은 속도도 빨라지고, overfitting도 방지해준다. 
# 컴퓨터 성능이 좋으면 안해줘도 되긴 한다. (특성값을 잃는 것이므로)
model.add(layers.MaxPooling2D((2, 2)))
# 64개의 특성으로 (3,3)
# relu를 쓰는 이유는 얘도 학습을 할 것이기 때문이다. 
# relu쓰는 이유는 overfitting 막기 위해서 
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.summary()
'''
Model: "sequential_90"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
=================================================================
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________
'''

머신러닝

전통적인 머신러닝을 할 때는 1차원(Flatten)으로 데이터를 만들어줘야합니다. Conv2D, Maxpooling 으로 찾아낸 특징들을 학습합니다.

# 전통적인 머신러닝 방법 - 1차원 (flatten)
# 특징을 학습한다. 
model.add(layers.Flatten())
# 64로 하는 이유는 Conv2D 64로 끝났기 때문이다? 아니다.
# 32로 바궈서 dense를 돌려도 돌아간다. 
# maxpooling 안하고 dense 안맞춰도 학습이 된다. 
# 사실상 성능이 크게 달라지지 않는다. 
# maxpooling은 속도도 빨라지고, overfitting도 방지해준다. 
model.add(layers.Dense(64, activation='relu'))
# layers.kernel_regularizer (l1 (|x|+|y|), l2 (x**2+y**2))
model.add(layers.Dense(10, activation='softmax'))

model.summary()
'''
Model: "sequential_90"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense_270 (Dense)            (None, 64)                36928     
_________________________________________________________________
dense_271 (Dense)            (None, 10)                650       
=================================================================
Total params: 93,322
Trainable params: 93,322
Non-trainable params: 0
_________________________________________________________________
'''

Compile

model.compile(optimizer='adam', loss ='sparse_categorical_crossentropy',
             metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5)
'''
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 127s 2ms/sample - loss: 0.1477 - accuracy: 0.9537
Epoch 2/5
60000/60000 [==============================] - 115s 2ms/sample - loss: 0.0472 - accuracy: 0.9853
Epoch 3/5
60000/60000 [==============================] - 112s 2ms/sample - loss: 0.0342 - accuracy: 0.9891
Epoch 4/5
60000/60000 [==============================] - 115s 2ms/sample - loss: 0.0249 - accuracy: 0.9922
Epoch 5/5
60000/60000 [==============================] - 112s 2ms/sample - loss: 0.0204 - accuracy: 0.9936
<tensorflow.python.keras.callbacks.History at 0x1e6b1e01160>
'''

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
# 10000/1 - 5s - loss: 0.0167 - accuracy: 0.9908

이미지는 과적합나기가 쉽습니다. 다음 기법들이 성능이 좋지만, 과적합되기가 쉽도록 하는 원인입니다.

layer
node
epoch

Overfitting을 막는 법에 대해 알아보도록 하겠습니다.

[overfitting 막는 법]

layers 에서 패널티를 줘서 과적합을 막을 수 있다.

l1
l2
l1 & l2 두개를 쓸 수도 있다.

layers.kernel_regularizer: (l1 (

), l2 (x2+y2))

cross_validation

확인하는 용도로 쓴다. (과적합을 막는건 아니지만, 확인 용도로 도와줄 수 있다.)
데이터가 작을 때 사용한다-차원의 저주로 오버피팅이 더 잘 난다.
단점: 시간이 오래걸리고, 비용이 많이 나온다. (딥러닝에서는 많이 안쓰지만, 그래도 확인해볼 수는 있다.)

dropout
layer 뒤에 추가 가능. activation 뒤에 추가 가능. 0.2 (20%)를 랜덤으로 빼버린다는 것. 학습속도가 느리다.

matplotlib에서 (state machine) - 앞에 있는 가장 가까운거 붙어서 하는 것 -> tf.keras.layers.Dropout(0.2) (앞에 Dense 붙어서 실행)

Early stopping

Ensemble

bagging
random boostrap 방법으로 샘플을 여러 번 뽑아 각 모델을 학습시켜 결과를 집계하는 방식이다.
boosting
성능 안좋은 것에 가중치줘서 학습시키는 것
stacking
A 알고리즘, B 알고리즘, C 알고리즘을 또 학습시켜서 나온결과로 또 학습시키는 것이다.

flowers

Share on

Twitter Facebook LinkedIn

[Tensorflow] CNN

Gyungah Cho

Convolution Neural Network

Conv2D

Maxpooling

Padding

[overfitting 막는 법]

Share on

Comments

You May Also Enjoy

[프로그래머스] Python3 알고리즘

[Tensorflow] Sequential vs. Model

[Tensorflow] model 연산

[Tensorflow] Autoencoder