Using model.eval() on neural network results in same output everytime for very different inputs

Question

I have a simple network implemented in pytorch say,

class network:

    def __init__(self):

        self.device = device
        #these are the 3 convolutional synapses; Same convolution;
        self.layer     = sequential(
                                conv2d(3, 3, (23), padding=11),
                                batch_norm_2d(3),
                                Swish(),
        
                                conv2d(3, 3, (11), padding=5),
                                batch_norm_2d(3),
                                Swish(),
        
                                conv2d(3, 3, (5), padding=2),
                                batch_norm_2d(3),
                                Swish(),

                                conv2d(3, 4, (3), padding=15, stride=2),
                                batch_norm_2d(4),
                                Swish(),
        
                                conv2d(4, 8, (3), padding=15, stride=2),
                                batch_norm_2d(8),
                                Swish(),
        
                                conv2d(8, 4, (1)),
                                batch_norm_2d(4),
                                Swish(),
        
                                conv2d(4, 8, (3), padding=15, stride=2),
                                batch_norm_2d(8),
                                Swish(),
                        
                                conv2d(8, 16, (3), padding=15, stride=2),
                                batch_norm_2d(16),
                                Swish(),
        
                                conv2d(16, 8, (1)),
                                batch_norm_2d(8),
                                Swish(),
        
                                conv2d(8, 16, (3), padding=15, stride=2),
                                batch_norm_2d(16),
                                Swish(),
                        
                                conv2d(16, 32, (3), padding=15, stride=2),
                                batch_norm_2d(32),
                                Swish(),
        
                                conv2d(32, 16, (1)),
                                batch_norm_2d(16),
                                Swish(),
        
                                conv2d(16, 32, (3), padding=15, stride=2),
                                batch_norm_2d(32),
                                Swish(),
                        
                                conv2d(32, 64, (3), padding=15, stride=2),
                                batch_norm_2d(64),
                                Swish(),
        
                                conv2d(64, 32, (1)),
                                batch_norm_2d(32),
                                Swish(),
        
                                conv2d(32, 64, (3), padding=15, stride=2),
                                batch_norm_2d(64),
                                Swish(),
                        
                                conv2d(64, 128, (3), padding=15, stride=2),
                                batch_norm_2d(128),
                                Swish(),
        
                                conv2d(128, 64, (1)),
                                batch_norm_2d(64),
                                Swish(),
        
                                conv2d(64, 128, (3), padding=15, stride=2),
                                batch_norm_2d(128),
                                Swish(),
                        
                                conv2d(128, 256, (3), padding=15, stride=2),
                                batch_norm_2d(256),
                                Swish(),
        
                                conv2d(256, 128, (1)),
                                batch_norm_2d(128),
                                Swish(),
                                
                                flatten(1, -1),
        
                                linear(128*29*29, 8*8*2*5),
                                batch_norm_1d(8*8*2*5),
                                Swish()
            )
    
    
        #loss and optimizer functions for ethirun
        self.Loss_1 = IoU_Loss() #the loss function for bounding box.
        self.Loss_2 = tor.nn.SmoothL1Loss(reduction='mean')
    
        #the optimizer
        self.Optimizer =     tor.optim.AdamW(self.parameters())#tor.optim.SGD(self.parameters(), lr=1e-2, momentum=0.9, weight_decay=1e-5, nesterov=True)
        self.Scheduler = tor.optim.lr_scheduler.StepLR(self.Optimizer, 288, gamma=0.5)
        self.sizes = tor.tensor(range(0, 5), dtype=tor.int64, device=self.device)

    def forward(self, input):
         return self.layer(input)

    def backprop(self, preds, lbls, val_or_trn):
    #takes predictions and labels and calculates error and backpropagates        
         mask = tor.index_select(lbls, -1, self.sizes[0])
         preds.register_hook(lambda grad: grad * mask.float())
         error = self.Loss_2(preds, lbls)
        
         if val_or_trn == 1:
             #backpropagation
             error.backward()
             self.Optimizer.step()
             self.Scheduler.step()
        
             #zeroing the gradients.
             self.Optimizer.zero_grad()
    

        return error.detach()

model = network()

Where the inputs, outputs and channels are arbitrary. Then say I create some random input tensor like this,

input_data = torch.randn(1, 3, 256, 256)

Then I predict some result in this data like this,

model(input_data)

And say I also change the input_data variable by initiating the torch.randn command a bunch of different times while keeping the model same. That is not re-initiating the model=network() command.

I get this error,

Expected more than 1 value per channel when training, got input size torch.Size([1, some_value])

So, I tried running it in evaluation mode by using the model.eval() function like this,

model.eval()

with tor.no_grad()
   pred = model(input_data)

model.train()

This works without errors. However no matter how I change the input_data variable I always get the same value in pred. If I however re-initiate the model's parameters I get a new pred Which once again does not change with different inputs. Unless I once again re-initiate the model using model=network(). What am I doing wrong?

Edit: To give more info on my problem I'm trying to create a yolo like network from scratch. And this is the dataset I'm using https://www.kaggle.com/devdgohil/the-oxfordiiit-pet-dataset

Natthaphon Hongcharoen · Accepted Answer

Basically that's what the Batchnorm doing. You use Batchnorm to make training less prone to overfit but don't use batchnorm in eval so that you can get the correct result Same go for Dropout.

Every CNN model with batch normalization and/or dropout does the same. The output of the same input will be different during train and eval

Which is exactly why Pytorch has the model.eval(). To turn these layers off during inference to get the correct output.

Edit

The problem is the activation and Batch Normalization at the output.

Only use something that will make the result similar to the ground truth. Like use sigmoid when you want output to be in range of 0-1 or tanh for -1 to 1 or softmax for probability across the axis.

Imagine relu function (which is basically the simpler version of swish and softplus). It will turn everything below 0 to 0. And chances are you need some output to be below 0 so your model won't converge at all.

Using model.eval() on neural network results in same output everytime for very different inputs

Answers (2)

Edit

Related Questions