Reputation: 719
I have X number of videos and each video has a different number of frames, let's say Y(x). Frames size is same for all videos 224X224X3. I am passing each frame to CNN and it outputs a feature vector of 1024. Now, I want to pass it to LSTM. For LSTM batch_size, time_steps , number_of_feature is required. How should I decide those value ? I have two configurations in mind but do not know how should I proceed.
Should I break 1024 into 32 X 32 to define time_steps and number_of_features and batch_size is number of frames
Should time_step should be corresponding to number of frames and number_of_feature should be 1024 and batch_size (?)
Upvotes: 0
Views: 1060
Reputation:
Consider to build a model with Keras Layers where you can stack all the layers like this:
model = Sequential()
model.add(TimeDistributed(Conv2D...))
model.add(TimeDistributed(MaxPooling2D...))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(LSTM, return_sequences=False...)) #or True, in case of Stacked
model.add(TimeDistributed(Dense...))
And try to preprocess videos directly with OpenCV, like read a number of frames from each video and store them into a big tensor that you can split with sklearn train_test_split, like this:
video_folder = '/path.../'
X_data = []
y_data = []
list_of_videos = os.listdir(vide_folder)
for i in list_of_videos:
#Video Path
vid = str(video_folder + i) #path to each video from list1 = os.listdir(path)
#Reading the Video
cap = cv2.VideoCapture(vid)
#Reading Frames
#fps = vcap.get(5)
#To Store Frames
frames = []
for j in range(40): #here we get 40 frames, for example
ret, frame = cap.read()
if ret == True:
print('Class 1 - Success!')
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #converting to gray
frame = cv2.resize(frame,(30,30),interpolation=cv2.INTER_AREA)
frames.append(frame)
else:
print('Error!')
X_data.append(frames) #appending each tensor of 40 frames resized for 30x30
y_data.append(1) #appending a class label to the set of 40 frames
X_data = np.array(X_data)
y_data = np.array(y_data) #ready to split! :)
I hope this help you! :)
Upvotes: 1
Reputation: 4918
So it depends on the problem you are trying to solve.
Action classification using videos?
if you are trying to predict the action/event from the video you have to use num_of_frames
as time_steps
, and batch_size
will be number of videos you want to process together.
Per frame object classification ?
In this case you can split the features as 32x32
as time_steps
,
Upvotes: 1