What's the different between the state_keep_prob and output_keep_prob parameters of tf.contrib.rnn.DropoutWrapper

Question

According to the API of tf.contrib.rnn.DropoutWrapper:

output_keep_prob: unit Tensor or float between 0 and 1, output keep probability; if it is constant and 1, no output dropout will be added.
state_keep_prob: unit Tensor or float between 0 and 1, output keep probability; if it is constant and 1, no output dropout will be added. State dropout is performed on the output states of the cell.

the description of these two parameters are almost the same, right?

I set output_keep_prob as default and state_keep_prob=0.2, the loss is always around 11.3 after 400 mini-batches' training, while I set output_keep_prob=0.2 and state_keep_prob as default, the loss returned by my model quickly down to around 6.0 after 20 mini-batches! It cost me 4 days to find this bug, really magic, can anyone explain the difference between these two parameters? Thanks a lot!

hyper parameters:

lr = 5E-4
batch_size = 32
state_size = 256
multirnn_depth = 2

Here is the dataset.

GeertH · Accepted Answer

state_keep_prob is the dropout added to the RNN's hidden states. The dropout added to the state of time step i will influence the calculation of states i+1, i+2, ... . As you have discovered, this propagation effect is often detrimental to the learning process.
output_keep_prob is the dropout added to the RNN's outputs, the dropout will have no effect on the calculation of the subsequent states.

What's the different between the state_keep_prob and output_keep_prob parameters of tf.contrib.rnn.DropoutWrapper

Answers (2)

Related Questions

What&#39;s the different between the state_keep_prob and output_keep_prob parameters of tf.contrib.rnn.DropoutWrapper

Answers (2)

Related Questions

What's the different between the state_keep_prob and output_keep_prob parameters of tf.contrib.rnn.DropoutWrapper