Is normalization necessary for regression problem in Neural network

Question

I am learning how to build a neural network using PyTorch. This formula is the target of my code: y =2X^3 + 7X^2 - 8*X + 120

It is a regression problem.

I used this because it is simple and the output can be calculated so that I can ensure my neural network is able to predict output with the given input.

However, I met some problem during training. The problem occurs in this line of code:

loss = loss_func(prediction, outputs)

The loss computed in this line is NAN (not a number)

I am using MSEloss as the loss function. 100 datasets are used for training the ANN model. The input X_train is ranged from -1000 to 1000.

I believed that the problem is owing to the value of X_train and MSEloss. X_train should be scaled into some values between 0 and 1 so that MSEloss can compute the loss.

However, is it possible to train the ANN model without scaling the input into value between 0 and 1 in a regression problem?

Here is my code, it does not use MinMaxScaler and it print the loss with NAN:

import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch.autograd import Variable

#Load datasets
dataset = pd.read_csv('test_100.csv')

x_temp_train = dataset.iloc[:79, :-1].values
y_temp_train = dataset.iloc[:79, -1:].values
x_temp_test = dataset.iloc[80:, :-1].values
y_temp_test = dataset.iloc[80:, -1:].values

#Turn into tensor
X_train = torch.FloatTensor(x_temp_train)
Y_train = torch.FloatTensor(y_temp_train)
X_test = torch.FloatTensor(x_temp_test)
Y_test = torch.FloatTensor(y_temp_test)

#Define a Artifical Neural Network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear = nn.Linear(1,1)  #input=1, output=1, bias=True
        
    def forward(self, x):
        x = self.linear(x)
        return x
net = Net()
print(net)

#Define a Loss function and optimizer
optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()

#Training
inputs = Variable(X_train)
outputs = Variable(Y_train)
for i in range(100):      #epoch=100
    prediction = net(inputs)
    loss = loss_func(prediction, outputs)
    optimizer.zero_grad() #zero the parameter gradients
    loss.backward()       #compute gradients(dloss/dx)
    optimizer.step()      #updates the parameters
    
    if i % 10 == 9:        #print every 10 mini-batches
        #plot and show learning process
        plt.cla()
        plt.scatter(X_train.data.numpy(), Y_train.data.numpy())
        plt.plot(X_train.data.numpy(), prediction.data.numpy(), 'r-', lw=2)
        plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 10, 'color': 'red'})
        plt.pause(0.1)
        
plt.show()

Thanks for your time.

Multihunter · Accepted Answer

Is normalization necessary for regression problem in Neural Network?

No.

But...

I can tell you that MSELoss works with non-normalised values. You can tell because:

>>> import torch
>>> torch.nn.MSELoss()(torch.randn(1)-1000, torch.randn(1)+1000)
tensor(4002393.)

MSE is a very well-behaved loss function, and you can't really get NaN without giving it a NaN. I would bet that your model is giving a NaN output.

The two most common causes of a NaN are: an accidental divide by 0, and absurdly large weights/gradients.

I ran a variant of your code on my machine using:

x = torch.randn(79, 1)*1000
y = 2*x**3 + 7*x**2 - 8*x + 120

And it got to NaN in about 20 training steps due to absurdly large weights.

A model can get absurdly large weights if the learning rate is too large. You may think 0.2 is not too large, but that's a typical learning rate people use for normalised data, which forces their gradients to be fairly small. Since you are not using normalised data, let's calculate how large your gradients are (roughly).

First, your x is on the order of 1e3, your expected output y scales at x^3, then MSE calculates (pred - y)^2. Then your loss is on the scale of 1e3^3^2=1e18. This propagates to your gradients, and recall that weight updates are += gradient*learning_rate, so it's easy to see why your weights fairly quickly explode outside of float precision.

How to fix this? Well you could use a learning rate of 2e-7. Or you could just normalise your data. I recommend normalising your data; it has other nice properties for training and avoids these kinds of problems.

Is normalization necessary for regression problem in Neural network

Answers (1)

Related Questions