Reputation: 28044
I am unable to overfit batches with multiple samples using autoencoder.
Fully connected decoder seems to handle more samples per batch than conv decoder, but then also fails when number of samples increases. Why is this happening, and how to debug this?
I am trying to use an auto encoder on 1d data points of size (n, 1, 1024)
, where n
is the number of samples in the batch.
I am trying to overfit to that single batch.
Using a convolutional decoder, I am only able to fit a single sample (n=1
), and when n>1
I am unable to drop the loss (MSE) below 0.2.
In blue: expected output (=input), in orange: reconstruction.
Multiple samples, single batch, loss won't go down:
Using more than one sample, we can see the net learns the general shape of the input (=output) signal, but greatly misses the bias.
Using a fully connected decoder does manage to reconstruct batches of multiple samples:
class Conv1DBlock(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size):
super().__init__()
self._in_channels = in_channels
self._out_channels = out_channels
self._kernel_size = kernel_size
self._block = nn.Sequential(
nn.Conv1d(
in_channels=self._in_channels,
out_channels=self._out_channels,
kernel_size=self._kernel_size,
stride=1,
padding=(self._kernel_size - 1) // 2,
),
# nn.BatchNorm1d(num_features=out_channels),
nn.ReLU(True),
nn.MaxPool1d(kernel_size=2, stride=2),
)
def forward(self, x):
for layer in self._block:
x = layer(x)
return x
class Upsample1DBlock(nn.Module):
def __init__(self, in_channels, out_channels, factor):
super().__init__()
self._in_channels = in_channels
self._out_channels = out_channels
self._factor = factor
self._block = nn.Sequential(
nn.Conv1d(
in_channels=self._in_channels,
out_channels=self._out_channels,
kernel_size=3,
stride=1,
padding=1
), # 'same'
nn.ReLU(True),
nn.Upsample(scale_factor=self._factor, mode='linear', align_corners=True),
)
def forward(self, x):
x_tag = x
for layer in self._block:
x_tag = layer(x_tag)
# interpolated = F.interpolate(x, scale_factor=0.5, mode='linear') # resnet idea
return x_tag
encoder:
self._encoder = nn.Sequential(
# n, 1024
nn.Unflatten(dim=1, unflattened_size=(1, 1024)),
# n, 1, 1024
Conv1DBlock(in_channels=1, out_channels=8, kernel_size=15),
# n, 8, 512
Conv1DBlock(in_channels=8, out_channels=16, kernel_size=11),
# n, 16, 256
Conv1DBlock(in_channels=16, out_channels=32, kernel_size=7),
# n, 32, 128
Conv1DBlock(in_channels=32, out_channels=64, kernel_size=5),
# n, 64, 64
Conv1DBlock(in_channels=64, out_channels=128, kernel_size=3),
# n, 128, 32
nn.Conv1d(in_channels=128, out_channels=128, kernel_size=32, stride=1, padding=0), # FC
# n, 128, 1
nn.Flatten(start_dim=1, end_dim=-1),
# n, 128
)
conv decoder:
self._decoder = nn.Sequential(
nn.Unflatten(dim=1, unflattened_size=(128, 1)), # 1
Upsample1DBlock(in_channels=128, out_channels=64, factor=4), # 4
Upsample1DBlock(in_channels=64, out_channels=32, factor=4), # 16
Upsample1DBlock(in_channels=32, out_channels=16, factor=4), # 64
Upsample1DBlock(in_channels=16, out_channels=8, factor=4), # 256
Upsample1DBlock(in_channels=8, out_channels=1, factor=4), # 1024
nn.ReLU(True),
nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=1),
nn.ReLU(True),
nn.Flatten(start_dim=1, end_dim=-1),
nn.Linear(1024, 1024)
)
FC decoder:
self._decoder = nn.Sequential(
nn.Linear(128, 256),
nn.ReLU(True),
nn.Linear(256, 512),
nn.ReLU(True),
nn.Linear(512, 1024),
nn.ReLU(True),
nn.Flatten(start_dim=1, end_dim=-1),
nn.Linear(1024, 1024)
)
Another observation is that when the batch size increases more, to say, 16, the FC decoder also starts to fail.
In the image, 4 samples of a 16 sample batch I am trying to overfit
What could be wrong with the conv decoder?
How to debug this or make the conv decoder work?
Upvotes: 3
Views: 898
Reputation: 1515
In your case, you are overfitting on a single batch. As Linear layers will have more parameters than that of Convolution layers maybe they are Memorising given small data easily.
As you are overfitting on a single batch, a small-batch of data will make it very easy to memorise on the other hand for large batch with single Update of network per batch(during overfitting) make network to learn Generalized abstract features. (This works better if more batches are there with a lot of variety of data)
I tried to reproduce your problem using simple Gaussian data. Just by using LeakyReLU in place of ReLU with proper learning rate solved the problem. Same architecture given by you is used.
Hyper parameters:
batch_size = 16
epochs = 100
lr = 1e-3
optimizer = Adam
loss(after training with ReLU) = 0.27265918254852295
loss(after training with LeakyReLU) = 0.0004763789474964142
Upvotes: 2