trumee
trumee

Reputation: 393

Validation loss when using Dropout

I am trying to understand the effect of dropout on validation Mean Absolute Error (non-linear regression problem).

Without dropout

enter image description here

With dropout of 0.05

enter image description here

With dropout of 0.075 enter image description here

Without any dropouts the validation loss is more than training loss as shown in 1. My understanding is that the validation loss should only be slightly more than the training loss for a good fit.

Carefully, I increased the dropout so that validation loss is close to the training loss as seen in 2. The dropout is only applied during training and not during validation, hence the validation loss is lower than the training loss.

Finally the dropout was increased further and the validation loss again became more than the training loss in 3.

Which amongst these three should be called as a good fit?

Following the response of Marcin Możejko, I predicted against three tests as shown in 4. The 'Y' axis shows RMS error instead of MAE. The model 'without dropout' gave the best result.

enter image description here

Upvotes: 10

Views: 9575

Answers (1)

Marcin Możejko
Marcin Możejko

Reputation: 40516

Well - this a really good question. In my opinion - the lowest validation score (confirmed on a separate test set) is the best fit. Remember that in the end - the performance of your model on a totally new data is the most crucial thing and the fact that it performed even better on a training set is not so important.

Moreover - I think that your model might generaly underfit - and you could try extend it to e.g. have more layers or neurons and prune it a little bit using dropout in order to prevent example memoization.

If my hypothesis turned out to be false - remember - that it still might be possible that there are certain data patterns present only on validation set (this relatively often in case of medium size datasets) what makes the divergence of train and test loss. Moreover - I think that even though that your losses values saturated in case without dropout there is still a room for improvement by simple increase in number of epochs as there seems to be a trend for losses to be smaller.

Another technique I recommend you to try is reducing learning rate on plateau (using example this callback) as your model seems to need refinement with lower value learning rate.

Upvotes: 6

Related Questions