Reputation: 2155
I am trying to use BatchNorm in Keras. The training accuracy increases over time. From 12% to 20%, slowly but surely. The test accuracy however decreases from 12% to 0%. Random baseline is 12%.
I very much assume this is due to the batchnorm layer (removing the batchnorm layer results in ~12% test accuracy), which maybe does not initialize parameters gamma and beta well enough. Do I have to regard anything special when applying batchnorm? I don't really understand what else could have gone wrong. I have the following model:
model = Sequential()
model.add(BatchNormalization(input_shape=(16, 8)))
model.add(Reshape((16, 8, 1)))
#1. Conv (64 filters; 3x3 kernel)
model.add(default_Conv2D())
model.add(BatchNormalization(axis=3))
model.add(Activation('relu'))
#2. Conv (64 filters; 3x3 kernel)
model.add(default_Conv2D())
model.add(BatchNormalization(axis=3))
model.add(Activation('relu'))
...
#8. Affine (NUM_GESTURES units) Output layer
model.add(default_Dense(NUM_GESTURES))
model.add(Activation('softmax'))
sgd = optimizers.SGD(lr=0.1)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
default_Conv2D and default_Dense are defined as follows:
def default_Conv2D():
return Conv2D(
filters=64,
kernel_size=3,
strides=1,
padding='same',
# activation=None,
# use_bias=True,
# kernel_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None), #RandomUniform(),
kernel_regularizer=regularizers.l2(0.0001),
# bias_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None), # RandomUniform(),
# bias_regularizer=None
)
def default_Dense(units):
return Dense(
units=units,
# activation=None,
# use_bias=True,
# kernel_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None),#RandomUniform(),
# bias_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None),#RandomUniform(),
kernel_regularizer=regularizers.l2(0.0001),
# bias_regularizer=None
)
Upvotes: 2
Views: 3104
Reputation: 2155
It seems that there was something broken with Keras itself.
A naive
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
did the trick.
@wontonimo, thanks a lot for your really great answer!
Upvotes: 0
Reputation: 3763
The issue is overfitting.
This is supported by your first 2 observations :
The first statement tells me that your network is memorizing the training set. The second statement tells me that when you prevent the network from memorizing the training set (or even learning) then it stops making error to do with memorization.
There are a few solutions to overfitting, but it is a problem large than this post. Please treat the following list as a "top" list and not exhaustive:
slow increase in accuracy
As a side note, you hinted that your accuracy isn't increasing as fast as you like by saying slowly but surely. I've had great success when I've done all of the following steps
Upvotes: 4