Mark Mahill
Mark Mahill

Reputation: 43

How to structure my video dataset based on extracted features for building a CNN-LSTM classification model?

For my project which deals with the recognition of emotions, I have a dataset consisting of multiple videos, which range from 0.5s-10s. I have an application which goes through each video and creates a .csv file containing the features it has extracted from each frame in the video, i.e., each row represents each frame from the video (with no. of rows being variable) and the columns represent the different features the application has extracted from the frame (with no. of columns being fixed). Each .csv filename also contains a code representing the emotion being expressed in the video.

Initially, my plan was to extract each frame from the video and pass each frame as input to the following CNN-LSTM (CNN for the spatial features and LSTM for the temporal features) model I was planning on using.

    model = Sequential()

    model.add(Input(input_shape))

    model.add(Conv3D(6, (1, 5, 5), (1, 1, 1), activation='relu', name='conv-1'))
    model.add(AveragePooling3D((1, 2, 2), strides=(1, 2, 2), name='avgpool-1'))

    model.add(Conv3D(16, (1, 5, 5), (1, 1, 1), activation='relu', name='conv-2'))
    model.add(AveragePooling3D((1, 2, 2), strides=(1, 2, 2), name='avgpool-2'))

    model.add(Conv3D(32, (1, 5, 5), (1, 1, 1), activation='relu', name='conv-3'))
    model.add(AveragePooling3D((1, 2, 2), strides=(1, 2, 2), name='avgpool-3'))

    model.add(Conv3D(64, (1, 4, 4), (1, 1, 1), activation='relu', name='conv-4'))
    model.add(Reshape((30, 64), name='reshape'))

    model.add(CuDNNLSTM(64, return_sequences=True, name='lstm-1'))
    model.add(CuDNNLSTM(64, name='lstm-2'))

    model.add(Dense(6, activation=tf.nn.softmax, name='result')) 

I still plan on using a CNN-LSTM model but I don't know how to structure my dataset now. I thought of labelling each frame in each .csv file with the corresponding emotion label and then combining all the .csv files into a single .csv file. This combined .csv file would then be passed to the above model, after changing the input shape and other necessary parameters, but I don't know if the model would be able to differentiate between the videos if done in that way.

So to conclude, I need help structuring my dataset and how this dataset should be passed to a CNN-LSTM model.

Upvotes: 0

Views: 217

Answers (1)

Sachin Prasad H S
Sachin Prasad H S

Reputation: 146

By looking at your problem statement I don't think there is a need to differentiate between the videos.

You can go ahead with your approach of labeling each frame in the video and combining it to a single CSV file.

For can use the below code to convert to NumPy arrays from CSV file to prepare your model to train by following the below method.

data = pd.read_csv('input.csv')

width, height = 48, 48

datapoints = data['pixels'].tolist()

#getting features for training
X = []
for xseq in datapoints:
    xx = [int(xp) for xp in xseq.split(' ')]
    xx = np.asarray(xx).reshape(width, height)
    X.append(xx.astype('float32'))

X = np.asarray(X)
X = np.expand_dims(X, -1)

#getting labels for training
y = pd.get_dummies(data['emotion']).as_matrix()

#storing them using numpy
np.save('fdataX', X)
np.save('flabels', y)

Upvotes: 0

Related Questions