texaspythonic
texaspythonic

Reputation: 73

Difference between two Sequence to Sequence Models keras (with and without RepeatVector)

I try to understand what the difference between this model describde here, the following one:

from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

and the sequence to sequence model described here is second describion

What is the difference ? The first one has the RepeatVector while the second does not have that? Is the first model not taking the decoders hidden state as inital state for the prediction ?

Are there a paper describing the first and second one ?

Upvotes: 4

Views: 932

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86620

In the model using RepeatVector, they're not using any kind of fancy prediction, nor dealing with states. They're letting the model do everything internally and the RepeatVector is used to transform a (batch, latent_dim) vector (which is not a sequence) into a (batch, timesteps, latent_dim) (which is now a proper sequence).

Now, in the other model, without RepeatVector, the secret lies in this additional function:

def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence

This runs a "loop" based on a stop_condition for creating the time steps one by one. (The advantage of this is making sentences without a fixed length).

It also explicitly takes the states generated in each step (in order to keep the proper connection between each individual step).


In short:

  • Model 1: creates the length by repeating the latent vector
  • Model 2: creates the length by looping new steps until a stop condition is reached

Upvotes: 4

Related Questions