Reputation: 43
I'm using Resnet50 model to do transfer learning, using 100,000 images in total of 20 scenes(MIT Place365 dataset). I trained only the last 160 layers(due to the memory restriction). The problem is I got a pretty high accuracy but extremely low validation accuracy, I think this might be an overfitting problem but I don't know how to solve it. I will really appreciate if anyone can give me advice about how to solve my low val_acc problem, thank you very much. My code is as follows:
V1 = np.load("C:/Users/Desktop/numpydataKeras_20_val/imgonehot_val_500.npy")
V2 = np.load("C:/Users/Desktop/numpydataKeras_20_val/labelonehot_val_500.npy")
net = keras.applications.resnet50.ResNet50(include_top=False, weights='imagenet', input_tensor=None, input_shape=(224, 224, 3))
x = net.output
x = Flatten()(x)
x = Dense(128)(x)
x = Activation('relu')(x)
x = Dropout(0.5)(x)
output_layer = Dense(20, activation='softmax', name='softmax')(x)
net_final = Model(inputs=net.input, outputs=output_layer)
for layer in net_final.layers[:-160]:
layer.trainable = False
for layer in net_final.layers[-160:]:
layer.trainable = True
net_final.compile(Adam(lr=.00002122), loss='categorical_crossentropy', metrics=['accuracy'])
def data_generator():
n = 100000
Num_batch = 100000/100
arr = np.arange(1000)
while (True):
for i in arr:
seed01 = random.randint(0,1000000)
X_batch = np.load( "C:/Users/Desktop/numpydataKeras/imgonehot_"+str((i+1)*100)+".npy" )
y_batch = np.load( "C:/Users/Desktop/numpydataKeras/labelonehot_"+str((i+1)*100)+".npy" )
yield X_batch, y_batch
weights_file = 'C:/Users/Desktop/Transfer_learning_resnet50_fit_generator_02s.h5'
early_stopping = EarlyStopping(monitor='val_acc', patience=5, mode='auto', verbose=2)
model_checkpoint = ModelCheckpoint(weights_file, monitor='val_acc', save_best_only=True, verbose=2)
callbacks = [early_stopping, model_checkpoint]
model_fit = net_final.fit_generator(
validation_data=(V1, V2),
The followings are the printouts:
Epoch 1/5
1000/1000 [==============================] - 3481s 3s/step - loss: 1.7917 - acc: 0.4757 - val_loss: 3.5872 - val_acc: 0.0560
Epoch 00001: val_acc improved from -inf to 0.05600, saving model to C:/Users/Desktop/Transfer_learning_resnet50_fit_generator_02s.h5
Epoch 2/5
1000/1000 [==============================] - 4884s 5s/step - loss: 1.1287 - acc: 0.6595 - val_loss: 4.2113 - val_acc: 0.0520
Epoch 00002: val_acc did not improve from 0.05600
Epoch 3/5
1000/1000 [==============================] - 4964s 5s/step - loss: 0.8033 - acc: 0.7464 - val_loss: 4.9595 - val_acc: 0.0520
Epoch 00003: val_acc did not improve from 0.05600
Epoch 4/5
1000/1000 [==============================] - 4961s 5s/step - loss: 0.5677 - acc: 0.8143 - val_loss: 4.5484 - val_acc: 0.0520
Epoch 00004: val_acc did not improve from 0.05600
Epoch 5/5
1000/1000 [==============================] - 4928s 5s/step - loss: 0.3999 - acc: 0.8672 - val_loss: 4.6155 - val_acc: 0.0400
Epoch 00005: val_acc did not improve from 0.05600
Upvotes: 3
Views: 1651
Reputation: 2941
Following it seems that batch normalization should be trainable.
The following code can replace the loop where you set/unset trainable layers:
for layer in model.layers:
if hasattr(layer, 'moving_mean') and hasattr(layer, 'moving_variance'):
layer.trainable = True
K.eval(K.update(layer.moving_mean, K.zeros_like(layer.moving_mean)))
K.eval(K.update(layer.moving_variance, K.zeros_like(layer.moving_variance)))
layer.trainable = False
On my own data, I needed to reduce batch size to avoid OOM, and I now have:
Epoch 1/10
470/470 [==============================] - 90s 192ms/step - loss: 0.3513 - acc: 0.8660 - val_loss: 0.1299 - val_acc: 0.9590
Epoch 2/10
470/470 [==============================] - 83s 177ms/step - loss: 0.2204 - acc: 0.9163 - val_loss: 0.1276 - val_acc: 0.9471
Epoch 3/10
470/470 [==============================] - 83s 177ms/step - loss: 0.2219 - acc: 0.9184 - val_loss: 0.1048 - val_acc: 0.9589
Epoch 4/10
470/470 [==============================] - 83s 177ms/step - loss: 0.1813 - acc: 0.9327 - val_loss: 0.1857 - val_acc: 0.9303
Warning, it may impact accuracy and you must freeze your model to avoid weird inference. But it seems to be the only way that worked for me.
Another comment only checks the layer names to set it trainable if it's a batch normalization, but it didn't changed anything for me. Maybe it can help for your dataset.
Upvotes: 1