Data Preprocessing - Input Shape for TimeDistributed CNN (LRCN) & ConvLSTM2D for Video Classification

Question

I'm trying to do binary classification for labeled data for 300+ videos. The goal is to extract features using a ConvNet and feed into to an LSTM for sequencing with a binary output after evaluating all the frames in the video. I've preprocessed each video to have exactly 200 frames with each image being 256 x 256 so that it would be easier to feed into a DNN and split the dataset into two folders as labels. (e.g. dog and cat)

However, after searching stackoverflow for hours, I'm still unsure how to reshape the dataset of video frames so that the model accounts for the number of frames. I'm trying to feed the video frames into a 3D ConvNets and TimeDistributed (2DConvNets) + LSTM, (e.g. (300, 200, 256, 256, 3) ) with no luck. I'm able to perform 2D ConvNet classification (data is a 4D Tensor, need to add a time step dimension to make it a 5D Tensor ) pretty easily but now having issues wrangling with the temporal aspect.

I've been using Keras ImageDataGenerator and train_datagen.flow_from_directory to read in the images and have been running into shape mismatch errors when I attempt to feed it to a TimeDistributed ConvNet. I know hypothetically if I have a X_train dataset I can potentially do X_train = X_train.reshape(...). Any example code would be very much appreciated.

Data Preprocessing - Input Shape for TimeDistributed CNN (LRCN) & ConvLSTM2D for Video Classification

Answers (1)

Related Questions

Data Preprocessing - Input Shape for TimeDistributed CNN (LRCN) &amp; ConvLSTM2D for Video Classification

Answers (1)

Related Questions

Data Preprocessing - Input Shape for TimeDistributed CNN (LRCN) & ConvLSTM2D for Video Classification