Reputation: 363
I am experimenting with a simple 2 layer neural network with pytorch, feeding in only three inputs of size 10 each, with a single value as output. I have normalised inputs and lowered learning rate. It is my understanding that a two layer fully connected neural network should be able to trivially fit to this data
Features:
0.8138 1.2342 0.4419 0.8273 0.0728 2.4576 0.3800 0.0512 0.6872 0.5201
1.5666 1.3955 1.0436 0.1602 0.1688 0.2074 0.8810 0.9155 0.9641 1.3668
1.7091 0.9091 0.5058 0.6149 0.3669 0.1365 0.3442 0.9482 1.2550 1.6950
[torch.FloatTensor of size 3x10]
Targets
[124, 125, 122]
[torch.FloatTensor of size 3]
The code is adapted from a simple example and I am using MSELoss as the loss function. The loss diverges to infinity after just a few iterations:
features = torch.from_numpy(np.array(features))
x_data = Variable(torch.Tensor(features))
y_data = Variable(torch.Tensor(targets))
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(10,5)
self.linear2 = torch.nn.Linear(5,1)
def forward(self, x):
l_out1 = self.linear(x)
y_pred = self.linear2(l_out1)
return y_pred
model = Model()
criterion = torch.nn.MSELoss(size_average = False)
optim = torch.optim.SGD(model.parameters(), lr = 0.001)
def main():
for iteration in range(1000):
y_pred = model(x_data)
loss = criterion(y_pred, y_data)
print(iteration, loss.data[0])
optim.zero_grad()
loss.backward()
optim.step()
Any help would be appreciated. Thanks
EDIT:
Indeed it seems that this was simply due to the learning rate
being too high. Setting to 0.00001
fixes convergence issues, albeit giving very slow convergence.
Upvotes: 3
Views: 1895
Reputation: 11
This is because you're not using a non-linearity between layers, and your network is still Linear.
You can use Relu in order to make it non linear. You can change the forward method like this :
...
y_pred = torch.nn.functional.F.relu(self.linear2(l_out1))
...
Upvotes: 1
Reputation: 1434
Maybe you can try to predict a log(y) instead of y to improve the convergence even more. Also Adam optimizer (adaptive learning rate) should help + BatchNormalization (for example between your linear layers).
Upvotes: 0