Reputation: 891
According to the mathematical formulation from wikipedia-lstm-math-equation, as shown below,
there should be only hidden state h_t
and cell state c_t
. However, when I tried to write RNN code on Keras, there are three: lstm_output
, state_h
and state_c
.
I am now wondering what is the mathematical formulation of lstm_output
?
Here is my code:
from keras.layers import Input, LSTM
lstm_input = Input(shape=(28, 10))
lstm_output, state_h, state_c = LSTM(units=32,
return_sequences=True,
return_state=True,
unroll=True)(lstm_input)
print(lstm_output, state_h, state_c)
and it gives
Using TensorFlow backend.
(<tf.Tensor 'lstm_1/transpose_1:0' shape=(?, 28, 32) dtype=float32>, <tf.Tensor 'lstm_1/mul_167:0' shape=(?, 32) dtype=float32>, <tf.Tensor 'lstm_1/add_221:0' shape=(?, 32) dtype=float32>)
Upvotes: 3
Views: 3164
Reputation: 11225
Let's break it down, looking at this line from the source code - return h, [h, c]
:
h
of each time step. So it has shape (batch_size, sequence_length, hidden_size)
, in your case it is (?, 28, 32)
. As the documentation says, it is returned as a sequence because you set return_sequences=True
.h
and if you can check, it should be equal to lstm_output[:,-1]
. Notice why it's shape is (?, 32)
, since it is the last timestep's output, not at every timestep.c
.The equations are often implemented in different ways to optimise for certain features but they all follow the original paper. Note that there might be variations on the activations such as using hard_sigmoid
for the recurrent activiation and these should be clearly noted in the documentation.
Upvotes: 1