Reputation: 139
I'm using Keras using Tensorflow backend.
model = Sequential()
model.add(Masking(mask_value = 0., input_shape = (MAX_LENGTH, 1)))
model.add(LSTM(16, input_shape = (BATCH_SIZE, MAX_LENGTH, 1), return_sequences = False))
model.add(Dense(units = 2))
model.add(Activation("sigmoid"))
model.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])
This python code works, but I wonder whether there are 16 LSTM blocks with 1 cell each, or 1 LSTM block with 16 cells.
Thanks in advance!
Upvotes: 5
Views: 364
Reputation: 1606
Ok so your question got me thinking and I think I over did it but here goes nothing. Here's a snippet of code I did to get some insights behind the LSTM implementation.
from keras.layers import LSTM
from keras.models import Sequential
model = Sequential()
model.add(LSTM(10, input_shape=(20, 30), return_sequences=True))
model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])
weights = model.get_weights()
Now, by inspecting the weights shapes we can get an intuition on what's happening.
In [12]: weights[0].shape
Out[12]: (30, 40)
In [14]: weights[1].shape
Out[14]: (10, 40)
In [15]: weights[2].shape
Out[15]: (40,)
And here is a description of them:
In [26]: model.weights
Out[26]:
[<tf.Variable 'lstm_4/kernel:0' shape=(30, 40) dtype=float32_ref>,
<tf.Variable 'lstm_4/recurrent_kernel:0' shape=(10, 40) dtype=float32_ref>,
<tf.Variable 'lstm_4/bias:0' shape=(40,) dtype=float32_ref>]
Those are the only weights available. I also went to see the Keras implementation on https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1765
So you can see that @gorjan was right, it implementes one cell, meaning the 4 gates (for the recurrent input as well as the sequence input), along with their biases.
The "layer" thinking here should be applied to the number of times the LSTM will be unrolled, in this case 30.
Hope this helps.
Upvotes: 4
Reputation: 5575
When you are using cells LSTM, GRU
, you don't have the notion of layers per se. What you actually have is a cell, that implements few gates. Each of the gates constitutes of a separate weight matrix that the model will learn during training. For example, in your case, what you will have is 1 cell, where each of the gates defined by matrices will have a dimension (feature_size_of_your_input, 16)
. I suggest that you read: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ really carefully before you start implementing this kind of stuff. Otherwise, you are just using them as a black box model without understanding what is happening under the hood.
Upvotes: 1