Reputation: 4490
I am learning Tensorflow and Keras to implement LSTM
many-to-many
model where the length of input sequence is equal to the length of the output sequence.
Sample Code:
Inputs:
voc_size = 10000
embed_dim = 64
lstm_units = 75
size_batch = 30
count_classes = 5
Model:
from tensorflow.keras.layers import ( Bidirectional, LSTM,
Dense, Embedding, TimeDistributed )
from tensorflow.keras import Sequential
def sample_build(embed_dim, voc_size, batch_size, lstm_units, count_classes):
model = Sequential()
model.add(Embedding(input_dim=voc_size,
output_dim=embed_dim,input_length=50))
model.add(Bidirectional(LSTM(units=lstm_units,return_sequences=True),
merge_mode="ave"))
model.add(Dense(200))
model.add(TimeDistributed(Dense(count_classes+1)))
# Compile model
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
return model
sample_model = sample_build(embed_dim,voc_size,
size_batch, rnn_units,
count_classes)
I am having trouble understanding the shapes of input and output for each layer. For example, the shape of the output of Embedding_Layer
is (BATCH_SIZE, time_steps, length_of_input)
and in this case, it is (30, 50, 64)
.
Similarly, the output shape of Bidirectional LSTM
later is (30, 50, 75)
. This is will be the input for the next Dense Layer
with 200
units. But the shape of the weight matrix of Dense Layer
is (number of units
in the current layer, number of units in the previous layer, which is (200,75)
in this case. So how does the matrix calculation happen between 2D
shape of the Dense Layer
and the 3D
shape of the Bidirectional Layer? Any explanations on the shape clarification will be helpful
Upvotes: 1
Views: 135
Reputation: 7765
The Dense can do 3D operation, it will flatten the the input to shape (batch_size * time_steps, features) and then apply a dense layer and reshape it back to orignal (batch_size, time_steps, units). In keras's documentation of Dense layer, it says:
Note: If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 1 of the kernel (using tf.tensordot). For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).
Another point regarding the output of Embedding
layer. As you said, it is correct that it is a 3D output, but correctly the shape correspond to (BATCH_SIZE, input_dim, embeddings_dim)
Upvotes: 1