Input fixed length sequence of frames to CNN

Question

I want my pytorch CNN to take as input a sequence of length SEQ_LEN of 32x32 RGB images concatenated along channels dimension. Therefore, a single input of the network has shape (32, 32, 3, SEQ_LEN). How should I define my CNN input layer?

The common way

SEQ_LEN = 10
input_conv = nn.Conv2d(in_channels=SEQ_LEN, out_channels=32, kernel_size=3)

BATCH_SIZE = 64
frames = np.random.randint(0, 255, size=(BATCH_SIZE, SEQ_LEN, 3, 32, 32))
frames_tensor = torch.tensor(frames)

input_conv(frames_tensor)

gives the error

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 10, 3, 3], but got 5-dimensional input of size [64, 10, 3, 32, 32] instead

Input fixed length sequence of frames to CNN

Answers (1)

Related Questions