Dropout rate in bottle neck layers

Question

It is common to use a dropout rate of 0.5 as a default which I also use in my fully-connected network. This advise follows the recommendations from the original Dropout paper (Hinton at al).

My network consists of fully-connected layers of size

[1000, 500, 100, 10, 100, 500, 1000, 20].

I do not apply dropout to the last layer. But I do apply it to the bottle neck layer of size 10. This does not seem reasonable given that dropout = 0.5. I guess to much information gets lost. Is there a rule of thumb how to treat bottle neck layers when using dropout? Is it better to increase the size of the bottle neck or decrease dropout rate?

Dropout rate in bottle neck layers

Answers (1)

Related Questions