JAbrams
JAbrams

Reputation: 325

Pytorch loss inf nan

I'm trying to do simple linear regression with 1 feature. It's a simple 'predict salary given years experience' problem. The NN trains on years experience (X) and a salary (Y). For some reason the loss is exploding and ultimately returns inf or nan

This is the code I have:

    import torch
    import torch.nn as nn
    import pandas as pd
    import numpy as np
    
    dataset = pd.read_csv('./salaries.csv')
    
    x_temp = dataset.iloc[:, :-1].values
    y_temp = dataset.iloc[:, 1:].values
    
    X_train = torch.FloatTensor(x_temp)
    Y_train = torch.FloatTensor(y_temp)
   
    class Model(torch.nn.Module): 
        def __init__(self):
            super().__init__()
            self.linear = torch.nn.Linear(1,1)
    
        def forward(self, x):
            y_pred = self.linear(x)
            return y_pred
    
    model = Model()
    
    loss_func = torch.nn.MSELoss(size_average=False)
    optim = torch.optim.SGD(model.parameters(), lr=0.01)
    
    #training 
    for epoch in range(200):
        #calculate y_pred
        y_pred = model(X_train)
    
        #calculate loss
        loss = loss_func(y_pred, Y_train)
        print(epoch, "{:.2f}".format(loss.data))
    
        #backward pass + update weights
        optim.zero_grad()
        loss.backward()
        optim.step()
    
    
    test_exp = torch.FloatTensor([[8.0]])
    print("8 years experience --> ", model(test_exp).data[0][0].item())

As I mentioned, once it starts training the loss gets super big and ends up showing inf after like the 10th epoch.

I suspect it may have something to do with how i'm loading the data? This is what is in salaries.csv file:

Years Salary
1.1 39343
1.3 46205
1.5 37731
2   43525
2.2 39891
2.9 56642
3   60150
3.2 54445
3.2 64445
3.7 57189
3.9 63218
4   55794
4   56957
4.1 57081
4.5 61111
4.9 67938
5.1 66029
5.3 83088

Thank you for your help

Upvotes: 2

Views: 23986

Answers (4)

arunppsg
arunppsg

Reputation: 1553

Another possibility of getting nan loss is the input tensor of the model containing the nan values. Try filtering nan values from model input.

Upvotes: 0

Aditya Gautam
Aditya Gautam

Reputation: 31

Please reduce the learning rate "lr" to 0.001 or 0.0001. Having larger values for lr makes the gradient to explode and result in inf. I have tried by both lr=0.001 and lr=0.0001 it works fine for me for. Please try once and let me know.

Upvotes: 3

prosti
prosti

Reputation: 46291

Here is an example how this all happens. You may try to run this program which basically represents r-deep layer network.

import torch
import math
import matplotlib.pyplot as plt
def stat(t, p=True):
    m = t.mean()
    s = t.std()
    if p==True:
        print(f"MEAN: {m}, STD: {s}")
    return(m,s)

_m = []
_s = []

c = 100
r = 50# repeat steps
x = torch.randn(c)
m = torch.randn(c,c)#/math.sqrt(n)
stat(x)

for _ in range (0,r):
    x = m@x    
    _1, _2 = stat(x, False)
    _m.append(_1)
    _s.append(_2)


stat(x)

plt.plot(_m)
plt.plot(_s)
plt.legend(["mean","std"])
plt.show()

enter image description here

Upvotes: 0

Mushegh
Mushegh

Reputation: 127

Once the loss becomes inf after a certain pass, your model gets corrupted after backpropagating. This probably happens because the values in "Salary" column are too big. try normalizing the salaries.

Alternatively, you could try to initialize the parameters by hand (rather than letting it be initialized randomly), letting the bias term be the average of salaries, and the slope of the line be 0 (for instance). That way the initial model would be close enough to the optimal solution, so that the loss does not blow up.

Upvotes: 5

Related Questions