Reputation: 2140
This is my simple reproducible code:
from keras.callbacks import ModelCheckpoint
from keras.models import Model
from keras.models import load_model
import keras
import numpy as np
SEQUENCE_LEN = 45
LATENT_SIZE = 20
VOCAB_SIZE = 100
inputs = keras.layers.Input(shape=(SEQUENCE_LEN, VOCAB_SIZE), name="input")
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(VOCAB_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
x = np.random.randint(0, 90, size=(10, SEQUENCE_LEN,VOCAB_SIZE))
y = np.random.normal(size=(10, SEQUENCE_LEN, VOCAB_SIZE))
NUM_EPOCHS = 1
checkpoint = ModelCheckpoint(filepath='checkpoint/{epoch}.hdf5')
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS,callbacks=[checkpoint])
and here is my code to have a look at the weights in the encoder layer:
for epoch in range(1, NUM_EPOCHS + 1):
file_name = "checkpoint/" + str(epoch) + ".hdf5"
lstm_autoencoder = load_model(file_name)
encoder = Model(lstm_autoencoder.input, lstm_autoencoder.get_layer('encoder_lstm').output)
print(encoder.output_shape[1])
weights = encoder.get_weights()[0]
print(weights.shape)
for idx in range(encoder.output_shape[1]):
token_idx = np.argsort(weights[:, idx])[::-1]
here print(encoder.output_shape)
is (None,20)
and print(weights.shape)
is (100, 80)
.
I understand that get_weight
will print the weight transition after the layer.
The part I did not get based on this architecture is 80
. what is it?
And, are the weights
here the weight that connect the encoder layer to the decoder? I meant the connection between encoder and the decoder.
I had a look at this question here. as it is only simple dense layers I could not connect the concept to the seq2seq model.
Update1
What is the difference between:
encoder.get_weights()[0]
and encoder.get_weights()[1]
?
the first one is (100,80)
and the second one is (20,80)
like conceptually?
any help is appreciated:)
Upvotes: 0
Views: 954
Reputation: 33410
The encoder
as you have defined it is a model, and it consists of two layers: an input layer and the 'encoder_lstm'
layer which is the bidirectional LSTM layer in the autoencoder. So its output shape would be the output shape of 'encoder_lstm'
layer which is (None, 20)
(because you have set LATENT_SIZE = 20
and merge_mode="sum"
). So the output shape is correct and clear.
However, since encoder
is a model, when you run encoder.get_weights()
it would return the weights of all the layers in the model as a list. The bidirectional LSTM consists of two separate LSTM layers. Each of those LSTM layers has 3 weights: the kernel, the recurrent kernel and the biases. So encoder.get_weights()
would return a list of 6 arrays, 3 for each of the LSTM layers. The first element of this list, as you have stored in weights
and is subject of your question, is the kernel of one of the LSTM layers. The kernel of an LSTM layer has a shape of (input_dim, 4 * lstm_units)
. The input dimension of 'encoder_lstm'
layer is VOCAB_SIZE
and its number of units is LATENT_SIZE
. Therefore, we have (VOCAB_SIZE, 4 * LATENT_SIZE) = (100, 80)
as the shape of kernel.
Upvotes: 1