CNN LSTM keras for video classification

Question

I have created a video dataset where each video have dimensions 5(frames) x 32(width) x 32(height) x 4 (channels). I'm trying to classify (binary classification) these videos using a CNN LSTM network but I'm confused about the input shape and how I should reshape my dataset to train the network.

model = Sequential()
model.add(TimeDistributed(Conv2D(64, 5, activation='relu', padding='same', name='conv1', input_shape=??))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='pool1')))

model.add(TimeDistributed(Conv2D(64, 5, activation='relu', padding='same', name='conv2'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='pool2')))

model.add(TimeDistributed(Conv2D(64, 5, activation='relu', padding='same', name='conv3'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='pool3')))

model.add(TimeDistributed(Conv2D(64, 5, activation='relu', padding='same', name='conv4'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', name='pool4')))


model.add(TimeDistributed(Flatten()))
model.add(LSTM(256, return_sequences=False, dropout=0.5))
model.add(Dense(1, activation='sigmoid'))

Am I missing anything in the model?

thushv89 · Accepted Answer

Your input shape should be (batch_size, time steps, height, width, channels). So it should be a 5 dimensional tensor.

Also, your input_shape argument should go like this. It should be an argument for the TimeDistributed layer not the Conv2D layer, because TimeDistributed is the first layer. Here, I'm showing what the input shape would be for a,

batch of arbitray number of samples
5 time steps (video frames)
32vpixels tall (height)
32 pixels wide (width)
4 channels

model.add(TimeDistributed(Conv2D(64, 5, activation='relu', padding='same', name='conv1'), input_shape=(5, 32, 32, 4)))

CNN LSTM keras for video classification

Answers (1)

Related Questions