Reputation: 207
I have an input of shape 14 x 10 x 128 x 128
, where 14
is batch_size
, 10
is the sequence_length
and each item in the sequence is of shape 128 x 128
. I want to learn to map this input to output of shape 14 x 10 x 128
, i.e., for each item in the sequence I want to learn 128-binary classifiers.
Does the following model make sense? So, first I reshape my input to 140 x 128 x 128
and then pass it through the model and reshape the output back to 14 x 10 x 128
.
classifier = nn.Sequential(
nn.Conv1d(128, 128, 1),
nn.ReLU(),
nn.BatchNorm1d(128),
nn.Conv1d(128, 128, 1),
nn.ReLU(),
nn.BatchNorm1d(128),
nn.Conv1d(128, 1, 1)
)
Thank you.
Upvotes: 1
Views: 115
Reputation: 40668
Not really convinced a 1D convolution will get you anywhere since it reasons in two dimensions only. In your case, you are dealing with a sequence of 2D elements.
Naturally a nn.Conv2d
would seem more appropriate for this kind of task.
You are looking to do a one-to-one mapping with your sequence elements and can therefore consider each one of them as an independent instance.
Then a straightforward approach is to simply collapse the sequence into the batch axis and use a CNN coupled with a fully-connected layer.
Here is a very minimal example with a single layer:
model = nn.Sequential(nn.Conv2d(1, 8, 2),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.LazyLinear(128))
This requires you to reshape the tensor before and after to collapse and expand the sequence dimensions:
>>> x = torch.rand(14, 10, 128, 128)
>>> y = model(x.view(-1,1,128,128)).view(-1,10,128)
Upvotes: 1