Ossz
Ossz

Reputation: 364

Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1

I am developing an LSTM autoencoder model for anomaly detection. I have my keras model setup as below:

from keras.models import Sequential

from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape

def create_RNN_with_attention():
    x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
    RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
    attention_layer = attention()(RNN_layer_1)
    dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
    repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
    RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
    dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
    output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
    model=Model(x,output)
    model.compile(loss='mae', optimizer='adam')    
    return model

Notice the attention layer that I added, attention_layer. Before adding this, the model compiled perfectly, however after adding this attention_layer - the model is throwing out the following error: ValueError: Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1

My attention layer is setup as follows:

import keras.backend as K
class attention(Layer):
    def __init__(self,**kwargs):
        super(attention,self).__init__(**kwargs)
 
    def build(self,input_shape):
        self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1), 
                               initializer='random_normal', trainable=True)
        self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1), 
                               initializer='zeros', trainable=True)        
        super(attention, self).build(input_shape)
 
    def call(self,x):
        # Alignment scores. Pass them through tanh function
        e = K.tanh(K.dot(x,self.W)+self.b)
        # Remove dimension of size 1
        e = K.squeeze(e, axis=-1)   
        # Compute the weights
        alpha = K.softmax(e)
        # Reshape to tensorFlow format
        alpha = K.expand_dims(alpha, axis=-1)
        # Compute the context vector
        context = x * alpha
        context = K.sum(context, axis=1)
        return context

The idea of the attention mask is to allow the model to focus on more prominent features as is trains.

Why am I getting the error above and how can I fix this?

Upvotes: 2

Views: 386

Answers (2)

Hosein Jokar
Hosein Jokar

Reputation: 11

In the shown error, the problem is the mismatch of input dimensions to the `lstm_2' layer. This layer expects the input to be of three dimensions (batch_size, time_steps, features), but your input dimensions (ndim=2) only have two dimensions.

To solve this problem, it must be ensured that the input to this layer is of 3D type. If your model has attention and uses the ``attention'' layer, you must ensure that the output of this layer is correctly specified and matched.

To solve this problem, it is necessary to ensure the correct dimensions for the input to the `lstm_2' layer and, if necessary, make the necessary changes in the model structure.

also, the RepeatVector layer is not needed here and should be removed. beacause, In attention models, ``RepeatVector'' layer is usually not used. This layer helps to repeat the input vector as many times as the output time. But when the attention mechanism is used, there is no need to repeat the output vector because the importance is applied to all times.

More specifically, in your model, the output from the LSTM'' layer is first taken with return_sequences=True'' in RNN_layer_1''. Then, by applying the attention mechanism (through attention'' layer and RepeatVector'' to repeat vectors), the importances are determined for each time. Finally, with TimeDistributed Dense'', the output is calculated for each time.

Upvotes: 1

Marcin Możejko
Marcin Możejko

Reputation: 40516

I think that the problem lies in this line:

RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)

This layer outputs a tensor of shape (batch_size, 64). So this means that you output a vector and then run attention mechanism on w.r.t. to the batch dimension instead of a sequential dimension. This also means that you output with a squeezed batch dimension that is not acceptable for any keras layer. This is why the Repeat layer raises error as it expects vector of at least shape (batch_dimension, dim).

If you want to run attention mechanism over a sequence then you should switch the line mentioned above to:

RNN_layer_1 = LSTM(units=64, return_sequences=True)(x)

Upvotes: 3

Related Questions