Reputation: 349
I am trying to implement 2-layer neural network using different methods (TensorFlow, PyTorch and from scratch) and then compare their performance based on MNIST dataset.
I am not sure what mistakes I have made, but the accuracy in PyTorch is only about 10%, which is basically random guess. I think probably the weights does not get updated at all.
Note that I intentionally use the dataset provided by TensorFlow to keep the data I use through 3 different methods consistent for accurate comparison.
from tensorflow.examples.tutorials.mnist import input_data
import torch
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = torch.nn.Linear(784, 100)
self.fc2 = torch.nn.Linear(100, 10)
def forward(self, x):
# x -> (batch_size, 784)
x = torch.relu(x)
# x -> (batch_size, 10)
x = torch.softmax(x, dim=1)
return x
net = Net()
net.zero_grad()
Loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
for epoch in range(1000): # loop over the dataset multiple times
batch_xs, batch_ys = mnist_m.train.next_batch(100)
# convert to appropriate settins
# note the input to the linear layer should be (n_sample, n_features)
batch_xs = torch.tensor(batch_xs, requires_grad=True)
# batch_ys -> (batch_size,)
batch_ys = torch.tensor(batch_ys, dtype=torch.int64)
# forward
# output -> (batch_size, 10)
output = net(batch_xs)
# result -> (batch_size,)
result = torch.argmax(output, dim=1)
loss = Loss(output, batch_ys)
# backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
Upvotes: 3
Views: 2094
Reputation: 24099
The problem here is that you don't apply your fully connected layers fc1
and fc2
.
Your forward()
currently looks like:
def forward(self, x):
# x -> (batch_size, 784)
x = torch.relu(x)
# x -> (batch_size, 10)
x = torch.softmax(x, dim=1)
return x
So if you change it to:
def forward(self, x):
# x -> (batch_size, 784)
x = self.fc1(x) # added layer fc1
x = torch.relu(x)
# x -> (batch_size, 10)
x = self.fc2(x) # added layer fc2
x = torch.softmax(x, dim=1)
return x
It should work.
Regarding Umang Guptas answer: As I see it, calling zero_grad()
before calling backward()
as Mr.Robot did, is just fine. This shouldn't be a problem.
Edit:
So I did a short test - I set iterations from 1000
to 10000
to see the bigger picture if it is really decreasing. ( Of course I also loaded the data to mnist_m
as this wasn't included in the code you've posted )
I added a print condition to the code:
if epoch % 1000 == 0:
print('Epoch', epoch, '- Loss:', round(loss.item(), 3))
Which prints out the loss every 1000
iterations:
Epoch 0 - Loss: 2.305
Epoch 1000 - Loss: 2.263
Epoch 2000 - Loss: 2.187
Epoch 3000 - Loss: 2.024
Epoch 4000 - Loss: 1.819
Epoch 5000 - Loss: 1.699
Epoch 6000 - Loss: 1.699
Epoch 7000 - Loss: 1.656
Epoch 8000 - Loss: 1.675
Epoch 9000 - Loss: 1.659
Tested with PyTorch version 0.4.1
So you can see that with the changed forward()
the network is learning now, the rest of the code I left untouched.
Good luck further!
Upvotes: 6