sariii
sariii

Reputation: 2140

interpreting get_weight in LSTM model in keras

This is my simple reproducible code:

from keras.callbacks import ModelCheckpoint
from keras.models import Model
from keras.models import load_model
import keras
import numpy as np

SEQUENCE_LEN = 45
LATENT_SIZE = 20
VOCAB_SIZE = 100

inputs = keras.layers.Input(shape=(SEQUENCE_LEN, VOCAB_SIZE), name="input")
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(VOCAB_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()

x = np.random.randint(0, 90, size=(10, SEQUENCE_LEN,VOCAB_SIZE))
y = np.random.normal(size=(10, SEQUENCE_LEN, VOCAB_SIZE))
NUM_EPOCHS = 1
checkpoint = ModelCheckpoint(filepath='checkpoint/{epoch}.hdf5')
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS,callbacks=[checkpoint])

and here is my code to have a look at the weights in the encoder layer:

for epoch in range(1, NUM_EPOCHS + 1):
    file_name = "checkpoint/" + str(epoch) + ".hdf5"
    lstm_autoencoder = load_model(file_name)
    encoder = Model(lstm_autoencoder.input, lstm_autoencoder.get_layer('encoder_lstm').output)
    print(encoder.output_shape[1])
    weights = encoder.get_weights()[0]
    print(weights.shape)
    for idx in range(encoder.output_shape[1]):
        token_idx = np.argsort(weights[:, idx])[::-1]

here print(encoder.output_shape) is (None,20) and print(weights.shape) is (100, 80).

I understand that get_weight will print the weight transition after the layer.

The part I did not get based on this architecture is 80. what is it?

And, are the weights here the weight that connect the encoder layer to the decoder? I meant the connection between encoder and the decoder.

I had a look at this question here. as it is only simple dense layers I could not connect the concept to the seq2seq model.

Update1

What is the difference between: encoder.get_weights()[0] and encoder.get_weights()[1]? the first one is (100,80) and the second one is (20,80) like conceptually?

any help is appreciated:)

Upvotes: 0

Views: 954

Answers (1)

today
today

Reputation: 33410

The encoder as you have defined it is a model, and it consists of two layers: an input layer and the 'encoder_lstm' layer which is the bidirectional LSTM layer in the autoencoder. So its output shape would be the output shape of 'encoder_lstm' layer which is (None, 20) (because you have set LATENT_SIZE = 20 and merge_mode="sum"). So the output shape is correct and clear.

However, since encoder is a model, when you run encoder.get_weights() it would return the weights of all the layers in the model as a list. The bidirectional LSTM consists of two separate LSTM layers. Each of those LSTM layers has 3 weights: the kernel, the recurrent kernel and the biases. So encoder.get_weights() would return a list of 6 arrays, 3 for each of the LSTM layers. The first element of this list, as you have stored in weights and is subject of your question, is the kernel of one of the LSTM layers. The kernel of an LSTM layer has a shape of (input_dim, 4 * lstm_units). The input dimension of 'encoder_lstm' layer is VOCAB_SIZE and its number of units is LATENT_SIZE. Therefore, we have (VOCAB_SIZE, 4 * LATENT_SIZE) = (100, 80) as the shape of kernel.

Upvotes: 1

Related Questions