klee
klee

Reputation: 217

Fixing incorrect dimensions in PyTorch neural network

I am trying to train my neural network, which is written in PyTorch, but I got the following traceback because of incorrect dimensions. Got the following traceback

Traceback (most recent call last):
  File "plot_parametric_pytorch.py", line 139, in <module>
    ops = opfun(X_train[smpl])
  File "plot_parametric_pytorch.py", line 92, in <lambda>
    opfun = lambda X: model.forward(Variable(torch.from_numpy(X)))
  File "/mnt_home/klee/LBSBGenGapSharpnessResearch/deepnet.py", line 77, in forward
    x = self.features(x)
  File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 141, in forward
    self.return_indices)
  File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/_jit_internal.py", line 209, in fn
    return if_false(*args, **kwargs)
  File "/home/klee/anaconda3/envs/sharpenv/lib/python3.7/site-packages/torch/nn/functional.py", line 539, in _max_pool2d
    input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (512x1x1). Calculated output size: (512x0x0). Output size is too small

This is all when trying to run a forward pass. I'm pretty sure this is a small bug but I myself am new to writing PyTorch code so I am not sure if I know where it is. For reference, when I checked the dimensions of the Keras model version of this by using model.summary(), the final dimensions before flattening and adding dense layers(which I think should happen in self.classifier in pytorch, although I am not sure too) were 512 x 1 x 1.

This is my model in PyTorch:

class VGG(nn.Module):
    def __init__(self, num_classes=10):
        super(VGG, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Dropout(0.3),
            nn.Conv2d(64, 64, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(64, 128, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Conv2d(128, 128, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(128, 256, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Conv2d(256, 256, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Conv2d(256, 256, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(256, 512, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),

            nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Linear(512, 512, bias=False),
            nn.Dropout(0.5),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 512)
        x = self.classifier(x)
        return F.log_softmax(x)

def cifar10_deep(**kwargs):
    num_classes = getattr(kwargs, 'num_classes', 10)
    return VGG(num_classes)


def cifar100_deep(**kwargs):
    num_classes = getattr(kwargs, 'num_classes', 100)
    return VGG(num_classes)

And I am trying to run the following code:

cudnn.benchmark = True
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.astype('float32')
X_train = np.transpose(X_train, axes=(0, 3, 1, 2))
X_test = X_test.astype('float32')
X_test = np.transpose(X_test, axes=(0, 3, 1, 2))
X_train /= 255
X_test /= 255
device = torch.device('cuda:0')

# This is where you can load any model of your choice.
# I stole PyTorch Vision's VGG network and modified it to work on CIFAR-10.
# You can take this line out and add any other network and the code
# should run just fine.
model = cifar_shallow.cifar10_shallow()
#model.to(device)

# Forward pass
opfun = lambda X: model.forward(Variable(torch.from_numpy(X)))

# Forward pass through the network given the input
predsfun = lambda op: np.argmax(op.data.numpy(), 1)

# Do the forward pass, then compute the accuracy
accfun = lambda op, y: np.mean(np.equal(predsfun(op), y.squeeze()))*100

# Initial point
x0 = deepcopy(model.state_dict())

# Number of epochs to train for
# Choose a large value since LB training needs higher values
# Changed from 150 to 30
nb_epochs = 30 
batch_range = [25, 40, 50, 64, 80, 128, 256, 512, 625, 1024, 1250, 1750, 2048, 2500, 3125, 4096, 4500, 5000]

# parametric plot (i.e., don't train the network if set to True)
hotstart = False

if not hotstart:
    for batch_size in batch_range:
        optimizer = torch.optim.Adam(model.parameters())
        model.load_state_dict(x0)
        #model.to(device)
        average_loss_over_epoch = '-'
        print('Optimizing the network with batch size %d' % batch_size)
        np.random.seed(1337) #So that both networks see same sequence of batches
        for e in range(nb_epochs):
            model.eval()
            print('Epoch:', e, ' of ', nb_epochs, 'Average loss:', average_loss_over_epoch)
            average_loss_over_epoch = 0

            # Checkpoint the model every epoch
            torch.save(model.state_dict(), "./models/ShallowNetCIFAR10BatchSize" + str(batch_size) + ".pth")
            array = np.random.permutation(range(X_train.shape[0]))
            slices = X_train.shape[0] // batch_size
            beginning = 0
            end = 1

            # Training loop!
            for _ in range(slices):
                start_index = batch_size * beginning 
                end_index = batch_size * end
                smpl = array[start_index:end_index]
                model.train()
                optimizer.zero_grad()
                ops = opfun(X_train[smpl]) <<----- error in this line
                tgts = Variable(torch.from_numpy(y_train[smpl]).long().squeeze())
                loss_fn = F.nll_loss(ops, tgts)
                average_loss_over_epoch += loss_fn.data.numpy() / (X_train.shape[0] // batch_size)
                loss_fn.backward()
                optimizer.step()
                beginning += 1
                end += 1

I am wondering where in my model I went wrong. I was writing the PyTorch version of the following Keras model. Any help in fixing the small bug would be appreciated!


def deepnet(nb_classes):
    global img_size
    model = Sequential()
    model.add(Conv2D(64, (3, 3), input_shape=img_size))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu'))
    model.add(Dropout(0.3))
    model.add(Conv2D(64, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))




    model.add(Conv2D(128, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu')); model.add(Dropout(0.4))
    model.add(Conv2D(128, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))



    model.add(Conv2D(256, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu')); model.add(Dropout(0.4))
    model.add(Conv2D(256, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu')); model.add(Dropout(0.4))
    model.add(Conv2D(256, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))



    model.add(Conv2D(512, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu')); model.add(Dropout(0.4))
    model.add(Conv2D(512, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu')); model.add(Dropout(0.4))
    model.add(Conv2D(512, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))



    model.add(Conv2D(512, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu')); model.add(Dropout(0.4))
    model.add(Conv2D(512, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu')); model.add(Dropout(0.4))
    model.add(Conv2D(512, (3, 3), padding='same'))
    model.add(BatchNormalization(axis=1))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'))


    model.add(Flatten()); model.add(Dropout(0.5))
    model.add(Dense(512))
    model.add(BatchNormalization())
    model.add(Activation('relu')); model.add(Dropout(0.5))
    model.add(Dense(nb_classes, activation='softmax'))
    return model

Please let me know if there is an issue with the way I converted the neural network model from PyTorch to Keras. From what I understand, padding should always equal 1 in pytorch because of padding=same setting in Keras.

Upvotes: 0

Views: 3077

Answers (1)

Michael Jungo
Michael Jungo

Reputation: 32972

The first convolution doesn't use padding.

nn.Conv2d(3, 64, kernel_size=3, bias=False)

Therefore the spatial dimensions will be reduced by 2. In the case of CIFAR the input has size [batch_size, 3, 32, 32] and the output would be [batch_size, 64, 30, 30]. For all other convolutions the spatial dimensions are unchanged, but the max pooling will halve them (integer division). Since you have 5 max pooling layers, the height/width change as follows:

30 -> 15 -> 7 -> 3 -> 1 -> 0 (error)

In the Keras version you are using padding in the max pooling layers as well, which is presumably only applied if the input is not strictly divisible by 2. If you wanted to replicate that behaviour in PyTorch you would have to set the padding of the max pooling layers manually for the ones that receive an input with an odd height/width.

I don't think that using padding in max pooling with a kernel size of 2 is beneficial, especially as you are using ReLU before them, meaning that the padded max pooling just preserves the border values (it's a different story for bigger kernel sizes).

The simplest solution is to use padding in the first convolution, such that the spatial dimensions are unchanged:

nn.Conv2d(3, 64, kernel_size=3, padding=1, bias=False)

Another option would be to remove the last max pooling layer, since the height/width are already 1, but that also means that the last three convolutions are applied to only one value, since the input sizes would be [batch_size, 512, 1, 1], which kind of defeats the purpose of using a convolution.

Upvotes: 1

Related Questions