Underfitting a single batch: Can't cause autoencoder to overfit multi-sample batches of 1d data. How to debug?

Question

TL;DR

I am unable to overfit batches with multiple samples using autoencoder.

Fully connected decoder seems to handle more samples per batch than conv decoder, but then also fails when number of samples increases. Why is this happening, and how to debug this?

In depth

I am trying to use an auto encoder on 1d data points of size (n, 1, 1024), where n is the number of samples in the batch.

I am trying to overfit to that single batch.

Using a convolutional decoder, I am only able to fit a single sample (n=1), and when n>1 I am unable to drop the loss (MSE) below 0.2.

In blue: expected output (=input), in orange: reconstruction.

Single sample, single batch:

Multiple samples, single batch, loss won't go down:

Using more than one sample, we can see the net learns the general shape of the input (=output) signal, but greatly misses the bias.

Using a fully connected decoder does manage to reconstruct batches of multiple samples:

Relevant code:

class Conv1DBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size):
        super().__init__()
        self._in_channels = in_channels
        self._out_channels = out_channels
        self._kernel_size = kernel_size

        self._block = nn.Sequential(
                nn.Conv1d(
                        in_channels=self._in_channels,
                        out_channels=self._out_channels,
                        kernel_size=self._kernel_size,
                        stride=1,
                        padding=(self._kernel_size - 1) // 2,
                ),
                # nn.BatchNorm1d(num_features=out_channels),
                nn.ReLU(True),
                nn.MaxPool1d(kernel_size=2, stride=2),
        )

    def forward(self, x):
        for layer in self._block:
            x = layer(x)
        return x


class Upsample1DBlock(nn.Module):
    def __init__(self, in_channels, out_channels, factor):
        super().__init__()
        self._in_channels = in_channels
        self._out_channels = out_channels
        self._factor = factor

        self._block = nn.Sequential(
                nn.Conv1d(
                        in_channels=self._in_channels,
                        out_channels=self._out_channels,
                        kernel_size=3,
                        stride=1,
                        padding=1
                ),  # 'same'
                nn.ReLU(True),
                nn.Upsample(scale_factor=self._factor, mode='linear', align_corners=True),
        )

    def forward(self, x):
        x_tag = x
        for layer in self._block:
            x_tag = layer(x_tag)
        # interpolated = F.interpolate(x, scale_factor=0.5, mode='linear') # resnet idea
        return x_tag

encoder:

self._encoder = nn.Sequential(
            # n, 1024
            nn.Unflatten(dim=1, unflattened_size=(1, 1024)),
            # n, 1, 1024
            Conv1DBlock(in_channels=1, out_channels=8, kernel_size=15),
            # n, 8, 512
            Conv1DBlock(in_channels=8, out_channels=16, kernel_size=11),
            # n, 16, 256
            Conv1DBlock(in_channels=16, out_channels=32, kernel_size=7),
            # n, 32, 128
            Conv1DBlock(in_channels=32, out_channels=64, kernel_size=5),
            # n, 64, 64
            Conv1DBlock(in_channels=64, out_channels=128, kernel_size=3),
            # n, 128, 32
            nn.Conv1d(in_channels=128, out_channels=128, kernel_size=32, stride=1, padding=0),  # FC
            # n, 128, 1
            nn.Flatten(start_dim=1, end_dim=-1),
            # n, 128
        )

conv decoder:

self._decoder = nn.Sequential(
    nn.Unflatten(dim=1, unflattened_size=(128, 1)),  # 1
    Upsample1DBlock(in_channels=128, out_channels=64, factor=4),  # 4
    Upsample1DBlock(in_channels=64, out_channels=32, factor=4),  # 16
    Upsample1DBlock(in_channels=32, out_channels=16, factor=4),  # 64
    Upsample1DBlock(in_channels=16, out_channels=8, factor=4),  # 256
    Upsample1DBlock(in_channels=8, out_channels=1, factor=4),  # 1024
    nn.ReLU(True),
    nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=1),
    nn.ReLU(True),
    nn.Flatten(start_dim=1, end_dim=-1),
    nn.Linear(1024, 1024)
)

FC decoder:

self._decoder = nn.Sequential(
    nn.Linear(128, 256),
    nn.ReLU(True),
    nn.Linear(256, 512),
    nn.ReLU(True),
    nn.Linear(512, 1024),
    nn.ReLU(True),
    nn.Flatten(start_dim=1, end_dim=-1),
    nn.Linear(1024, 1024)
)

Another observation is that when the batch size increases more, to say, 16, the FC decoder also starts to fail.

In the image, 4 samples of a 16 sample batch I am trying to overfit

What could be wrong with the conv decoder?

How to debug this or make the conv decoder work?

Girish Hegde · Accepted Answer

Dying ReLU

I think the main reason for underfitting in your case is Dying Relu problem. Your network is simple Autoencoder with no skip/residual connections. So Code in the bottleneck should encode enough information about bias in the data to make Decoder to learn.
So if ReLU activation function is used Negative Biased data information can be lost due to Dying ReLU problem. The solution is to to use better activation functions like LeakyReLU, ELU, MISH, etc.

Linear vs Conv.

In your case, you are overfitting on a single batch. As Linear layers will have more parameters than that of Convolution layers maybe they are Memorising given small data easily.

Batch Size

As you are overfitting on a single batch, a small-batch of data will make it very easy to memorise on the other hand for large batch with single Update of network per batch(during overfitting) make network to learn Generalized abstract features. (This works better if more batches are there with a lot of variety of data)

I tried to reproduce your problem using simple Gaussian data. Just by using LeakyReLU in place of ReLU with proper learning rate solved the problem. Same architecture given by you is used.

Hyper parameters:

batch_size = 16

epochs = 100

lr = 1e-3

optimizer = Adam

loss(after training with ReLU) = 0.27265918254852295

loss(after training with LeakyReLU) = 0.0004763789474964142

Underfitting a single batch: Can't cause autoencoder to overfit multi-sample batches of 1d data. How to debug?

TL;DR

In depth

Relevant code:

Answers (1)

Dying ReLU

Linear vs Conv.

Batch Size

With Relu

With Leaky Relu

Related Questions

Underfitting a single batch: Can&#39;t cause autoencoder to overfit multi-sample batches of 1d data. How to debug?

TL;DR

In depth

Relevant code:

Answers (1)

Dying ReLU

Linear vs Conv.

Batch Size

With Relu

With Leaky Relu

Related Questions

Underfitting a single batch: Can't cause autoencoder to overfit multi-sample batches of 1d data. How to debug?