Reputation: 2744
I'm training a simple neural network that looks as follows;
model = Sequential()
model.add(layers.GRU(32,
input_shape=(None, 13)))
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit_generator(train_gen,
steps_per_epoch=500,
epochs=40,
validation_data=val_gen)
Everything works fine and runs as it is supposed to. However it suffers from overfitting. Therefore I'm adding dropout regularization in the GRU layer as such:
model.add(layers.GRU(32,
dropout=0.2,
recurrent_dropout=0.2,
input_shape=(None, 13)))
This increases the running time of my epoch from +- 10 seconds to +- 120 seconds. Has anyone got an explanation as to why this is happening? And is there a way to combat it, since a running time of 12 times higher seems a bit extraordinary to me?
Upvotes: 0
Views: 866
Reputation: 461
I found this excerpt from the following paper which highlights why dropout increases the time taken per epoch, the above situation normally is a drawback of using dropouts.
One of the drawbacks of dropout is that it increases training time. A dropout network
typically takes 2-3 times longer to train than a standard neural network of the same architecture. A major cause of this increase is that the parameter updates are very noisy.
Each training case effectively tries to train a different random architecture. Therefore, the
gradients that are being computed are not gradients of the final architecture that will be
used at test time. Therefore, it is not surprising that training takes a long time
For more details you can refer to this paper on dropouts: http://jmlr.org/papers/v15/srivastava14a.html
Additionally, This answer https://stats.stackexchange.com/a/377126/197455 gives a good idea.
Cheers!
Upvotes: 2