Reputation: 75
I have a simple network implemented in pytorch say,
class network:
def __init__(self):
self.device = device
#these are the 3 convolutional synapses; Same convolution;
self.layer = sequential(
conv2d(3, 3, (23), padding=11),
batch_norm_2d(3),
Swish(),
conv2d(3, 3, (11), padding=5),
batch_norm_2d(3),
Swish(),
conv2d(3, 3, (5), padding=2),
batch_norm_2d(3),
Swish(),
conv2d(3, 4, (3), padding=15, stride=2),
batch_norm_2d(4),
Swish(),
conv2d(4, 8, (3), padding=15, stride=2),
batch_norm_2d(8),
Swish(),
conv2d(8, 4, (1)),
batch_norm_2d(4),
Swish(),
conv2d(4, 8, (3), padding=15, stride=2),
batch_norm_2d(8),
Swish(),
conv2d(8, 16, (3), padding=15, stride=2),
batch_norm_2d(16),
Swish(),
conv2d(16, 8, (1)),
batch_norm_2d(8),
Swish(),
conv2d(8, 16, (3), padding=15, stride=2),
batch_norm_2d(16),
Swish(),
conv2d(16, 32, (3), padding=15, stride=2),
batch_norm_2d(32),
Swish(),
conv2d(32, 16, (1)),
batch_norm_2d(16),
Swish(),
conv2d(16, 32, (3), padding=15, stride=2),
batch_norm_2d(32),
Swish(),
conv2d(32, 64, (3), padding=15, stride=2),
batch_norm_2d(64),
Swish(),
conv2d(64, 32, (1)),
batch_norm_2d(32),
Swish(),
conv2d(32, 64, (3), padding=15, stride=2),
batch_norm_2d(64),
Swish(),
conv2d(64, 128, (3), padding=15, stride=2),
batch_norm_2d(128),
Swish(),
conv2d(128, 64, (1)),
batch_norm_2d(64),
Swish(),
conv2d(64, 128, (3), padding=15, stride=2),
batch_norm_2d(128),
Swish(),
conv2d(128, 256, (3), padding=15, stride=2),
batch_norm_2d(256),
Swish(),
conv2d(256, 128, (1)),
batch_norm_2d(128),
Swish(),
flatten(1, -1),
linear(128*29*29, 8*8*2*5),
batch_norm_1d(8*8*2*5),
Swish()
)
#loss and optimizer functions for ethirun
self.Loss_1 = IoU_Loss() #the loss function for bounding box.
self.Loss_2 = tor.nn.SmoothL1Loss(reduction='mean')
#the optimizer
self.Optimizer = tor.optim.AdamW(self.parameters())#tor.optim.SGD(self.parameters(), lr=1e-2, momentum=0.9, weight_decay=1e-5, nesterov=True)
self.Scheduler = tor.optim.lr_scheduler.StepLR(self.Optimizer, 288, gamma=0.5)
self.sizes = tor.tensor(range(0, 5), dtype=tor.int64, device=self.device)
def forward(self, input):
return self.layer(input)
def backprop(self, preds, lbls, val_or_trn):
#takes predictions and labels and calculates error and backpropagates
mask = tor.index_select(lbls, -1, self.sizes[0])
preds.register_hook(lambda grad: grad * mask.float())
error = self.Loss_2(preds, lbls)
if val_or_trn == 1:
#backpropagation
error.backward()
self.Optimizer.step()
self.Scheduler.step()
#zeroing the gradients.
self.Optimizer.zero_grad()
return error.detach()
model = network()
Where the inputs, outputs and channels are arbitrary. Then say I create some random input tensor like this,
input_data = torch.randn(1, 3, 256, 256)
Then I predict some result in this data like this,
model(input_data)
And say I also change the input_data variable by initiating the torch.randn command a bunch of different times while keeping the model same. That is not re-initiating the model=network() command.
I get this error,
Expected more than 1 value per channel when training, got input size torch.Size([1, some_value])
So, I tried running it in evaluation mode by using the model.eval() function like this,
model.eval()
with tor.no_grad()
pred = model(input_data)
model.train()
This works without errors. However no matter how I change the input_data variable I always get the same value in pred. If I however re-initiate the model's parameters I get a new pred Which once again does not change with different inputs. Unless I once again re-initiate the model using model=network(). What am I doing wrong?
Edit: To give more info on my problem I'm trying to create a yolo like network from scratch. And this is the dataset I'm using https://www.kaggle.com/devdgohil/the-oxfordiiit-pet-dataset
Upvotes: 3
Views: 6191
Reputation: 2430
Basically that's what the Batchnorm doing. You use Batchnorm to make training less prone to overfit but don't use batchnorm in eval so that you can get the correct result Same go for Dropout.
Every CNN model with batch normalization and/or dropout does the same. The output of the same input will be different during train and eval
Which is exactly why Pytorch has the model.eval()
. To turn these layers off during inference to get the correct output.
The problem is the activation and Batch Normalization at the output.
Only use something that will make the result similar to the ground truth. Like use sigmoid
when you want output to be in range of 0-1 or tanh
for -1 to 1 or softmax
for probability across the axis.
Imagine relu
function (which is basically the simpler version of swish
and softplus
). It will turn everything below 0 to 0. And chances are you need some output to be below 0 so your model won't converge at all.
Upvotes: 1
Reputation: 131
You defined a neural network, but you are not training it.
For your model to predict different outputs from a same input after several iterations over data, your model needs to be able to tweak its weights and biases.
To do so, you need a loss function and optimizer, from which you'll be able to backpropagates the prediction error to adjust the model’s parameters, via gradient descent.
I invite you to follow that link, where every step to train a model in PyTorch are covered: QuickStart PyTorch
Upvotes: 0