Upsampling an autoencoder in pytorch

Question

I have defined my autoencoder in pytorch as following (it gives me a 8-dimensional bottleneck at the output of the encoder which works fine torch.Size([1, 8, 1, 1])):

self.encoder = nn.Sequential(
    nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(64, 8, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.MaxPool2d(7, stride=1)
)

self.decoder = nn.Sequential(
    nn.ConvTranspose2d(8, 64, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.Conv2d(64, 32, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(32, input_shape[0], kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Sigmoid()
)

What I cannot do is train the autoencoder with

def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
    return x

The decoder gives me an error that the decoder cannot upsample the tensor:

Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

Szymon Maszke · Accepted Answer

You are not upsampling enough via ConvTranspose2d, shape of your encoder is only 1 pixel (width x height), see this example:

import torch

layer = torch.nn.ConvTranspose2d(8, 64, kernel_size=3, stride=1)
print(layer(torch.randn(64, 8, 1, 1)).shape)

This prints your exact (3,3) shape after upsampling.

You can:

Make the kernel smaller - instead of 4 in first Conv2d in decoder use 3 or 2 or even 1
Upsample more, for example: torch.nn.ConvTranspose2d(8, 64, kernel_size=7, stride=2) would give you 7x7
What I would do personally: downsample less in encoder, so output shape after it is at least 4x4 or maybe 5x5. If you squash your image so much there is no way to encode enough information into one pixel, and even if the code passes the network won't learn any useful representation.

Upsampling an autoencoder in pytorch

Answers (2)

Related Questions