Phillip Jay Doe
Phillip Jay Doe

Reputation: 25

How does dropout work in keras' LSTM layer?

In keras' documentation there is no information regarding how dropout is actually implemented for LSTM layers.

However, there is a link to the paper "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks", which led me to believe that dropout is implemented as described in said paper.

That is, for each time-step in the time-series the layer is processing, the same dropout mask is used.

Looking at the source code, it seems to me that LSTMCell.call gets called iteratively, once for every time-step in the time-series, and generates a new dropout mask each time it is called.

My question is:

Either I misinterpreted keras' code, or the reference to the paper in keras' documentation is misleading. Which is it?

Upvotes: 2

Views: 1270

Answers (1)

Vikash Singh
Vikash Singh

Reputation: 14001

Both the paper and the code are consistent. You have understood correctly but interpreted the code a bit wrong.

There is a check before initialising dropout_mask, self._dropout_mask is None

So LSTMCell.call gets called iteratively, once for every time-step in the time-series, but only in the first call a new dropout mask is generated.

if 0 < self.dropout < 1 and self._dropout_mask is None:
    self._dropout_mask = _generate_dropout_mask(
        K.ones_like(inputs),
        self.dropout,
        training=training,
        count=4)
if (0 < self.recurrent_dropout < 1 and
        self._recurrent_dropout_mask is None):
    self._recurrent_dropout_mask = _generate_dropout_mask(
        K.ones_like(states[0]),
        self.recurrent_dropout,
        training=training,
        count=4)

Hope that clarifies your doubt.

Upvotes: 2

Related Questions