I-was-a-Ki
I-was-a-Ki

Reputation: 139

Inplementation of LSTM in Keras

I'm using Keras using Tensorflow backend.

model = Sequential()
model.add(Masking(mask_value = 0., input_shape = (MAX_LENGTH, 1)))
model.add(LSTM(16, input_shape = (BATCH_SIZE, MAX_LENGTH, 1), return_sequences = False))
model.add(Dense(units = 2))
model.add(Activation("sigmoid"))
model.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])

This python code works, but I wonder whether there are 16 LSTM blocks with 1 cell each, or 1 LSTM block with 16 cells.

Thanks in advance!

LSTM architecture

Upvotes: 5

Views: 364

Answers (3)

Diego Aguado
Diego Aguado

Reputation: 1606

Ok so your question got me thinking and I think I over did it but here goes nothing. Here's a snippet of code I did to get some insights behind the LSTM implementation.

from keras.layers import LSTM
from keras.models import Sequential

model = Sequential()
model.add(LSTM(10, input_shape=(20, 30), return_sequences=True))
model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
weights = model.get_weights()

Now, by inspecting the weights shapes we can get an intuition on what's happening.

In [12]: weights[0].shape
Out[12]: (30, 40)
In [14]: weights[1].shape
Out[14]: (10, 40)
In [15]: weights[2].shape
Out[15]: (40,)

And here is a description of them:

In [26]: model.weights
Out[26]: 
[<tf.Variable 'lstm_4/kernel:0' shape=(30, 40) dtype=float32_ref>,
 <tf.Variable 'lstm_4/recurrent_kernel:0' shape=(10, 40) dtype=float32_ref>,
 <tf.Variable 'lstm_4/bias:0' shape=(40,) dtype=float32_ref>]

Those are the only weights available. I also went to see the Keras implementation on https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1765

So you can see that @gorjan was right, it implementes one cell, meaning the 4 gates (for the recurrent input as well as the sequence input), along with their biases.

The "layer" thinking here should be applied to the number of times the LSTM will be unrolled, in this case 30.

Hope this helps.

Upvotes: 4

gorjan
gorjan

Reputation: 5575

When you are using cells LSTM, GRU, you don't have the notion of layers per se. What you actually have is a cell, that implements few gates. Each of the gates constitutes of a separate weight matrix that the model will learn during training. For example, in your case, what you will have is 1 cell, where each of the gates defined by matrices will have a dimension (feature_size_of_your_input, 16). I suggest that you read: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ really carefully before you start implementing this kind of stuff. Otherwise, you are just using them as a black box model without understanding what is happening under the hood.

Upvotes: 1

Slam
Slam

Reputation: 8582

It's for 1 block, 16 cells, afaik.

Upvotes: 1

Related Questions