Federico Taschin
Federico Taschin

Reputation: 2205

Input fixed length sequence of frames to CNN

I want my pytorch CNN to take as input a sequence of length SEQ_LEN of 32x32 RGB images concatenated along channels dimension. Therefore, a single input of the network has shape (32, 32, 3, SEQ_LEN). How should I define my CNN input layer?

The common way

SEQ_LEN = 10
input_conv = nn.Conv2d(in_channels=SEQ_LEN, out_channels=32, kernel_size=3)

BATCH_SIZE = 64
frames = np.random.randint(0, 255, size=(BATCH_SIZE, SEQ_LEN, 3, 32, 32))
frames_tensor = torch.tensor(frames)

input_conv(frames_tensor)

gives the error

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 10, 3, 3], but got 5-dimensional input of size [64, 10, 3, 32, 32] instead

Upvotes: 2

Views: 379

Answers (1)

gspr
gspr

Reputation: 11227

Given your comments, it sounds like your data is not fit for a 2D convolutional neural network at all, and that a 3D one (Conv3d) would be more appropriate. As you can see from its documentation, its input shape is what you would expect.

Upvotes: 1

Related Questions