Reputation: 325
I'm trying to do simple linear regression with 1 feature. It's a simple 'predict salary given years experience' problem.
The NN trains on years experience (X) and a salary (Y).
For some reason the loss is exploding and ultimately returns inf
or nan
This is the code I have:
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
dataset = pd.read_csv('./salaries.csv')
x_temp = dataset.iloc[:, :-1].values
y_temp = dataset.iloc[:, 1:].values
X_train = torch.FloatTensor(x_temp)
Y_train = torch.FloatTensor(y_temp)
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(1,1)
def forward(self, x):
y_pred = self.linear(x)
return y_pred
model = Model()
loss_func = torch.nn.MSELoss(size_average=False)
optim = torch.optim.SGD(model.parameters(), lr=0.01)
#training
for epoch in range(200):
#calculate y_pred
y_pred = model(X_train)
#calculate loss
loss = loss_func(y_pred, Y_train)
print(epoch, "{:.2f}".format(loss.data))
#backward pass + update weights
optim.zero_grad()
loss.backward()
optim.step()
test_exp = torch.FloatTensor([[8.0]])
print("8 years experience --> ", model(test_exp).data[0][0].item())
As I mentioned, once it starts training the loss gets super big and ends up showing inf
after like the 10th epoch.
I suspect it may have something to do with how i'm loading the data? This is what is in salaries.csv
file:
Years Salary
1.1 39343
1.3 46205
1.5 37731
2 43525
2.2 39891
2.9 56642
3 60150
3.2 54445
3.2 64445
3.7 57189
3.9 63218
4 55794
4 56957
4.1 57081
4.5 61111
4.9 67938
5.1 66029
5.3 83088
Thank you for your help
Upvotes: 2
Views: 23986
Reputation: 1553
Another possibility of getting nan
loss is the input tensor of the model containing the nan
values. Try filtering nan
values from model input.
Upvotes: 0
Reputation: 31
Please reduce the learning rate "lr" to 0.001 or 0.0001. Having larger values for lr makes the gradient to explode and result in inf. I have tried by both lr=0.001 and lr=0.0001 it works fine for me for. Please try once and let me know.
Upvotes: 3
Reputation: 46291
Here is an example how this all happens. You may try to run this program which basically represents r-deep layer network.
import torch
import math
import matplotlib.pyplot as plt
def stat(t, p=True):
m = t.mean()
s = t.std()
if p==True:
print(f"MEAN: {m}, STD: {s}")
return(m,s)
_m = []
_s = []
c = 100
r = 50# repeat steps
x = torch.randn(c)
m = torch.randn(c,c)#/math.sqrt(n)
stat(x)
for _ in range (0,r):
x = m@x
_1, _2 = stat(x, False)
_m.append(_1)
_s.append(_2)
stat(x)
plt.plot(_m)
plt.plot(_s)
plt.legend(["mean","std"])
plt.show()
Upvotes: 0
Reputation: 127
Once the loss becomes inf after a certain pass, your model gets corrupted after backpropagating. This probably happens because the values in "Salary" column are too big. try normalizing the salaries.
Alternatively, you could try to initialize the parameters by hand (rather than letting it be initialized randomly), letting the bias term be the average of salaries, and the slope of the line be 0 (for instance). That way the initial model would be close enough to the optimal solution, so that the loss does not blow up.
Upvotes: 5