Reputation: 21
I'm trying to do binary classification for labeled data for 300+ videos. The goal is to extract features using a ConvNet and feed into to an LSTM for sequencing with a binary output after evaluating all the frames in the video. I've preprocessed each video to have exactly 200 frames with each image being 256 x 256
so that it would be easier to feed into a DNN and split the dataset into two folders as labels. (e.g. dog and cat)
However, after searching stackoverflow for hours, I'm still unsure how to reshape the dataset of video frames so that the model accounts for the number of frames. I'm trying to feed the video frames into a 3D ConvNets
and TimeDistributed (2DConvNets) + LSTM
, (e.g. (300, 200, 256, 256, 3) ) with no luck. I'm able to perform 2D ConvNet classification (data is a 4D Tensor
, need to add a time step dimension to make it a 5D Tensor
) pretty easily but now having issues wrangling with the temporal aspect.
I've been using Keras ImageDataGenerator
and train_datagen.flow_from_directory
to read in the images and have been running into shape mismatch
errors when I attempt to feed it to a TimeDistributed ConvNet
. I know hypothetically if I have a X_train
dataset I can potentially do X_train = X_train.reshape(...)
. Any example code would be very much appreciated.
Upvotes: 2
Views: 1251
Reputation: 220
I think you could use ConvLSTM2D
in Keras for your purpose. ImageDataGenerator
is very good for CNN with images, but may be not convenient for CRNN with videos.
You have already transformed your 300 videos data in the same shape (200, 256, 256, 3), each video 200 frames, each frame 256x256 rgb. Next, you need to load them in a numpy array in shape (300, 200, 256, 256, 3). For reading videos in numpy arrays see this answer.
Then you can feed the data in a CRNN. Its first ConvLSTM2D
layer should have input_shape = (None, 200, 256, 256, 3)
.
A sample according to your data: (only illustrated and not tested)
from keras.models import Sequential
from keras.layers import Dense
from keras.layers.convolutional_recurrent import ConvLSTM2D
model = Sequential()
model.add(ConvLSTM2D(filters = 32, kernel_size = (5, 5), input_shape = (None, 200, 256, 256, 3)))
### model.add(...more layers)
model.add(Dense(units = num_of_categories, # num of your vedio categories
kernel_initializer = 'Orthogonal', activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
# then train it
model.fit(video_data, # shape (300, 200, 256, 256, 3)
[list of categories],
batch_size = 20,
epochs = 50,
validation_split = 0.1)
I hope this could be a little helpful.
Upvotes: 1