Connecting recurrent layer after cnn, what does tf.expand_dims do?

Question

I want to use a model for classifying eight class of images. I think using convolutional layers before recurrent layers can work for my problem. But, there is a problem using recurrent layers immediately after convolutional or dense layer which causes tensorflow give following error.

Input 0 is incompatible with layer simple_rnn_1: expected ndim=3, found ndim=2

I solve this problem using Tensorflow expand_dims() function using inside a Lambda layer. It seems worked properly but, I want to be sure about my model is working right. Despite looking at relevant documentation, I couldn't understand what expand_dims() done to make the model work.

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dropout, Flatten, Dense, Lambda, SimpleRNN
from tensorflow import expand_dims

def create_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(50, 50, 1), padding='same',
                     activation='relu'))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Flatten())
    model.add(Lambda(lambda x: expand_dims(model.output, axis=-1)))
    model.add(SimpleRNN(64, return_sequences=False))

    model.add(Dense(units=8, activation='softmax'))

    model.compile(optimizer='adam', loss='categorical_crossentropy',
                  metrics=['accuracy'])

    return model

I want to interpret CNN output as sequential information using recurrent layers (LSTM, GRU, other recurrent models). Am I doing right using Lambda layer with expand_dims()?

Ahmed Ragab · Accepted Answer

The Flatten layer reduces the dimensions of the images to (Batch Length, Feature dimensions), and the recurrent neural networks expect 3 dimensions for the input instead of 2, as it needs a new dimension for the time-series inputs as in (Batch Length, Time Dimensions, Feature Dimensions)

The expand_dims simply adds an extra dimension that transforms your flattened outputs to (Batch Length, Features Dimensions, 1). This means that the RNN would now work, and it will consider the features dimensions as the time dimension for the data, thus going throw them in an orderly fashion and doing what you intended.

Connecting recurrent layer after cnn, what does tf.expand_dims do?

Answers (1)

Related Questions