The mathematical formulation of LSTM in Keras?

Question

According to the mathematical formulation from wikipedia-lstm-math-equation, as shown below,

there should be only hidden state h_t and cell state c_t. However, when I tried to write RNN code on Keras, there are three: lstm_output, state_h and state_c.

I am now wondering what is the mathematical formulation of lstm_output? Here is my code:

from keras.layers import Input, LSTM

lstm_input = Input(shape=(28, 10))

lstm_output, state_h, state_c = LSTM(units=32,
                                     return_sequences=True,
                                     return_state=True,
                                     unroll=True)(lstm_input)
print(lstm_output, state_h, state_c)

and it gives

Using TensorFlow backend.

(, , )

nuric · Accepted Answer

Let's break it down, looking at this line from the source code - return h, [h, c]:

lstm_output: is the h of each time step. So it has shape (batch_size, sequence_length, hidden_size), in your case it is (?, 28, 32). As the documentation says, it is returned as a sequence because you set return_sequences=True.
state_h: is the last timestep's h and if you can check, it should be equal to lstm_output[:,-1]. Notice why it's shape is (?, 32), since it is the last timestep's output, not at every timestep.
state_c: is the last timestep's c.

The equations are often implemented in different ways to optimise for certain features but they all follow the original paper. Note that there might be variations on the activations such as using hard_sigmoid for the recurrent activiation and these should be clearly noted in the documentation.

The mathematical formulation of LSTM in Keras?

Answers (1)

Related Questions