Dimensions between embedding layer and lstm encoder layer don't match

I am trying to build an encoder-decoder model for text generation. I am using LSTM layers with an embedding layer. I have somehow a problem with the output of the embedding layer to the LSTM encoder layer. The error I get is:

 ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 13, 128, 512)

My encoder data has shape: (40, 13, 128) = (num_observations, max_encoder_seq_length, vocab_size) the embeddings_size/latent_dim = 512.

My questions are: how could I get "rid" of this 4'th dimension from the embeddings layer to the LSTM encoder layer, or in other words: how should I pass those 4 dimensions to the LSTM layer of the encoder model ? As I am new to this topic, what should I also eventually correct in the decoder LSTM layer ?

I have read at several posts including this, and this one and many others but couldn't find a solution. It seems to me that my problem is not in the model rather in the shape of the data. Any hint or remark with respect to what could potentially be wrong would be more than appreciated. Thank you very much

My model is the following from (this tutorial):

encoder_inputs = Input(shape=(max_encoder_seq_length,))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_decoder_seq_length,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile & run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# Note that `decoder_target_data` needs to be one-hot encoded,
# rather than sequences of integers like `decoder_input_data`![encoder_input_data, decoder_input_data],

The summary of my model is:

Model: "functional_1"
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 13)]         0                                            
input_2 (InputLayer)            [(None, 15)]         0                                            
embedding (Embedding)           (None, 13, 512)      65536       input_1[0][0]                    
embedding_1 (Embedding)         (None, 15, 512)      65536       input_2[0][0]                    
lstm (LSTM)                     [(None, 512), (None, 2099200     embedding[0][0]                  
lstm_1 (LSTM)                   (None, 15, 512)      2099200     embedding_1[0][0]                
dense (Dense)                   (None, 15, 128)      65664       lstm_1[0][0]                     
Total params: 4,395,136
Trainable params: 4,395,136
Non-trainable params: 0


I am formatting my data in the following way:

for i, text, in enumerate(input_texts):
    words = text.split() #text is a sentence 
    for t, word in enumerate(words):
        encoder_input_data[i, t, input_dict[word]] = 1.

Which gives for such command decoder_input_data[:2]:

array([[[0., 1., 0., ..., 0., 0., 0.],
        [0., 0., 1., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],
       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 1., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]], dtype=float32)

Akshay Sehgal
I am not sure what you are passing to the mode as inputs and outputs, but this is what works. Please note the shapes of the encoder and decoder inputs I am passing. Your inputs need to be in that shape for the model to run.

num_observations = 40
max_encoder_seq_length = 13
max_decoder_seq_length = 15
num_encoder_tokens = 128
num_decoder_tokens = 128
latent_dim = 512
batch_size = 256
epochs = 5

encoder_inputs = Input(shape=(max_encoder_seq_length,))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_decoder_seq_length,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)


model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

encoder_input_data = np.random.random((1000,13))
decoder_input_data = np.random.random((1000,15))
decoder_target_data = np.random.random((1000, 15, 128))[encoder_input_data, decoder_input_data],
Model: "functional_210"
Layer (type)                    Output Shape         Param #     Connected to                     
input_176 (InputLayer)          [(None, 13)]         0                                            
input_177 (InputLayer)          [(None, 15)]         0                                            
embedding_33 (Embedding)        (None, 13, 512)      65536       input_176[0][0]                  
embedding_34 (Embedding)        (None, 15, 512)      65536       input_177[0][0]                  
lstm_94 (LSTM)                  [(None, 512), (None, 2099200     embedding_33[0][0]               
lstm_95 (LSTM)                  (None, 15, 512)      2099200     embedding_34[0][0]               
dense_95 (Dense)                (None, 15, 128)      65664       lstm_95[0][0]                    
Total params: 4,395,136
Trainable params: 4,395,136
Non-trainable params: 0
Epoch 1/5
4/4 [==============================] - 3s 853ms/step - loss: 310.7389 - val_loss: 310.3570
Epoch 2/5
4/4 [==============================] - 3s 638ms/step - loss: 310.6186 - val_loss: 310.3362
Epoch 3/5
4/4 [==============================] - 3s 852ms/step - loss: 310.6126 - val_loss: 310.3345
Epoch 4/5
4/4 [==============================] - 3s 797ms/step - loss: 310.6111 - val_loss: 310.3369
Epoch 5/5
4/4 [==============================] - 3s 872ms/step - loss: 310.6117 - val_loss: 310.3352

The sequence data (text) needs to be passed to the inputs as label encoded sequences. This needs to be done by using something like textvectorizer from keras. Please read more about how to prepare text data for embedding layers and lstms here.

Upvotes: 1

