ecreif
ecreif

Reputation: 1182

Why is my pytorch Autoencoder giving me a "mat1 and mat2 shapes cannot be multiplied" error?

I know this is because the shapes don't match for the multiplication, but why when my code is similar to most example code I found:

import torch.nn as nn
...
#input is a 256x256 image
num_input_channels = 3
self.encoder = nn.Sequential(
            nn.Conv2d(num_input_channels*2**0, num_input_channels*2**1, kernel_size=3, padding=1, stride=2), #1 6 128 128
            nn.Tanh(),
            nn.Conv2d(num_input_channels*2**1, num_input_channels*2**2, kernel_size=3, padding=1, stride=2), #1 12 64 64
            nn.Tanh(),
            nn.Conv2d(num_input_channels*2**2, num_input_channels*2**3, kernel_size=3, padding=1, stride=2), #1 24 32 32
            nn.Tanh(),
            nn.Conv2d(num_input_channels*2**3, num_input_channels*2**4, kernel_size=3, padding=1, stride=2), #1 48 16 16
            nn.Tanh(),
            nn.Conv2d(num_input_channels*2**4, num_input_channels*2**5, kernel_size=3, padding=1, stride=2), #1 96 8 8
            nn.Tanh(),
            nn.Conv2d(num_input_channels*2**5, num_input_channels*2**6, kernel_size=3, padding=1, stride=2), #1 192 4 4
            nn.LeakyReLU(),
            nn.Conv2d(num_input_channels*2**6, num_input_channels*2**7, kernel_size=3, padding=1, stride=2), #1 384 2 2
            nn.LeakyReLU(),
            nn.Conv2d(num_input_channels*2**7, num_input_channels*2**8, kernel_size=2, padding=0, stride=1), #1 768 1 1
            nn.LeakyReLU(),
            nn.Flatten(),
            nn.Linear(768, 1024*32),
            nn.ReLU(),
            nn.Linear(1024*32, 256),
            nn.ReLU(),
        ).cuda()

I get the error "RuntimeError: mat1 and mat2 shapes cannot be multiplied (768x1 and 768x32768)"

To my understanding I should end up with a Tensor of shape [1,768,1,1] after the convolutions and [1,768] after flattening, so I can use a fully connected Linear layer that goes to 1024*32 in size (by which I tried to add some more ways for the neural net to store data/knowledge).

Using nn.Linear(1,1024*32) runs with a warning later: "UserWarning: Using a target size (torch.Size([3, 256, 256])) that is different to the input size (torch.Size([768, 3, 256, 256]))". I think it comes from my decoder, though

What am I not understanding correctly here?

Upvotes: 0

Views: 239

Answers (1)

flawr
flawr

Reputation: 11628

All torch.nn Modules require batched inputs, and it seems in your case you have no batch dimension. Without knowing your code I'm assuming you are using

my_input.shape == (3, 256, 256)

But you will need to add a batch dimension, that is, you need to have

my_input.shape == (1, 3, 256, 256)

You can easily do that by introducing a dummy dimension using:

my_input = my_input[None, ...]

Upvotes: 1

Related Questions