ravi
ravi

Reputation: 6328

LayerNorm inside nn.Sequential in torch

I am trying to use LayerNorm inside nn.Sequential in torch. This is what I am looking for-

import torch.nn as nn

class LayerNormCnn(nn.Module):
    def __init__(self):
        super(LayerNormCnn, self).__init__()
        self.net = nn.Sequential(
                nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
                nn.LayerNorm(),
                nn.ReLU(),
                nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
                nn.LayerNorm(),
                nn.ReLU(),
            )

    def forward(self, x):
        x = self.net(x)
        return x

Unfortunately, it doesn't work because LayerNorm requires normalized_shape as input. The code above throws following exception-

    nn.LayerNorm(),
TypeError: __init__() missing 1 required positional argument: 'normalized_shape'

Right now, this is how I have implemented it-

import torch.nn as nn
import torch.nn.functional as F


class LayerNormCnn(nn.Module):
    def __init__(self, state_shape):
        super(LayerNormCnn, self).__init__()
        self.conv1 = nn.Conv2d(state_shape[0], 32, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)

        # compute shape by doing a forward pass
        with torch.no_grad():
            fake_input = torch.randn(1, *state_shape)
            out        = self.conv1(fake_input)
            bn1_size   = out.size()[1:]
            out        = self.conv2(out)
            bn2_size   = out.size()[1:]

        self.bn1 = nn.LayerNorm(bn1_size)
        self.bn2 = nn.LayerNorm(bn2_size)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        return x

if __name__ == '__main__':
    in_shape   = (3, 128, 128)
    batch_size = 32

    model = LayerNormCnn(in_shape)
    x = torch.randn((batch_size,) + in_shape)
    out = model(x)
    print(out.shape)

Is it possible to use LayerNorm inside nn.Sequential?

Upvotes: 3

Views: 2625

Answers (1)

chakrr
chakrr

Reputation: 527

The original layer normalisation paper advised against using layer normalisation in CNNs, as receptive fields around the boundary of images will have different values as opposed to the receptive fields in the actual image content. This issue does not arise with RNNs, which is what layer norm was originally tested for. Are you sure you want to be using LayerNorm? If you're looking to compare a different normalisation technique against BatchNorm, consider GroupNorm. This gets rid of the LayerNorm assumption that all channels in a layer contribute equally to a prediction, which is problematic particularly if the layer is convolutional. Instead, each channel is divided further into groups, that still allows a GN layer to learn different statistics across channels.

Please refer here for related discussion.

Upvotes: 1

Related Questions