Tulsi
Tulsi

Reputation: 719

CNN-> LSTM network for videos

I have X number of videos and each video has a different number of frames, let's say Y(x). Frames size is same for all videos 224X224X3. I am passing each frame to CNN and it outputs a feature vector of 1024. Now, I want to pass it to LSTM. For LSTM batch_size, time_steps , number_of_feature is required. How should I decide those value ? I have two configurations in mind but do not know how should I proceed.

  1. Should I break 1024 into 32 X 32 to define time_steps and number_of_features and batch_size is number of frames

  2. Should time_step should be corresponding to number of frames and number_of_feature should be 1024 and batch_size (?)

Upvotes: 0

Views: 1060

Answers (2)

user9165727
user9165727

Reputation:

Consider to build a model with Keras Layers where you can stack all the layers like this:

model = Sequential()
model.add(TimeDistributed(Conv2D...))
model.add(TimeDistributed(MaxPooling2D...))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(LSTM, return_sequences=False...)) #or True, in case of Stacked
model.add(TimeDistributed(Dense...))

And try to preprocess videos directly with OpenCV, like read a number of frames from each video and store them into a big tensor that you can split with sklearn train_test_split, like this:

video_folder = '/path.../'
X_data = []
y_data = []
list_of_videos = os.listdir(vide_folder)

for i in list_of_videos:
    #Video Path
    vid = str(video_folder + i) #path to each video from list1 = os.listdir(path)
    #Reading the Video
    cap = cv2.VideoCapture(vid)
    #Reading Frames
    #fps = vcap.get(5)
    #To Store Frames
    frames = []
    for j in range(40): #here we get 40 frames, for example
        ret, frame = cap.read()
        if ret == True:
            print('Class 1 - Success!')
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #converting to gray
            frame = cv2.resize(frame,(30,30),interpolation=cv2.INTER_AREA)
            frames.append(frame)
        else:
            print('Error!')
    X_data.append(frames) #appending each tensor of 40 frames resized for 30x30
    y_data.append(1) #appending a class label to the set of 40 frames
X_data = np.array(X_data)
y_data = np.array(y_data) #ready to split! :)

I hope this help you! :)

Upvotes: 1

Ishant Mrinal
Ishant Mrinal

Reputation: 4918

So it depends on the problem you are trying to solve.

Action classification using videos?

if you are trying to predict the action/event from the video you have to use num_of_frames as time_steps, and batch_size will be number of videos you want to process together.

Per frame object classification ?

In this case you can split the features as 32x32 as time_steps,

Upvotes: 1

Related Questions