Reputation: 1037
I'm attempting to convert old code to PyTorch code as an experiment. Ultimately, I will be doing regression on a 10,000+ x 100 Matrix, updating weights and whatnot appropriately.
Trying to learn, I'm slowly scaling up on toy examples. I'm hitting a wall with the following sample code.
import torch
import torch.nn as nn
import torch.nn.functional as funct
from torch.autograd import Variable
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
x_data = Variable( torch.Tensor( [ [1.0, 2.0], [2.0, 3.0], [3.0, 4.0] ] ),
requires_grad=True )
y_data = Variable( torch.Tensor( [ [2.0], [4.0], [6.0] ] ) )
w = Variable( torch.randn( 2, 1, requires_grad=True ) )
b = Variable( torch.randn( 1, 1, requires_grad=True ) )
class Model(torch.nn.Module) :
def __init__(self) :
super( Model, self).__init__()
self.linear = torch.nn.Linear(2,1) ## 2 features per entry. 1 output
def forward(self, x2, w2, b2) :
y_pred = x2 @ w2 + b2
return y_pred
model = Model()
criterion = torch.nn.MSELoss( size_average=False )
optimizer = torch.optim.SGD( model.parameters(), lr=0.01 )
for epoch in range(10) :
y_pred = model( x_data,w,b ) # Get prediction
loss = criterion( y_pred, y_data ) # Calc loss
print( epoch, loss.data.item() ) # Print loss
optimizer.zero_grad() # Zero gradient
loss.backward() # Calculate gradients
optimizer.step() # Update w, b
However, doing so, my loss is always the same, and investigating shows my w and b never actually change. I'm a bit lost at what's going on here.
Ultimately, I'd like to be able to store the results of the "new" w and b to compare across iterations and datasets.
Upvotes: 0
Views: 2167
Reputation: 13113
It looks like a case of cargo programming to me.
Notice that your Model
class doesn't make use of self
in forward
, so it is effectively a "regular" (non-method) function, and model
is entirely stateless. The simplest fix to your code is to make optimizer
aware of w
and b
, by creating it as optimizer = torch.optim.SGD([w, b], lr=0.01)
. I also rewrite model
to be a function
import torch
import torch.nn as nn
# torch.autograd.Variable is roughly equivalent to requires_grad=True
# and is deprecated in PyTorch 1.0
# your code gives not reason to have `requires_grad=True` on `x_data`
x_data = torch.tensor( [ [1.0, 2.0], [2.0, 3.0], [3.0, 4.0] ])
y_data = torch.tensor( [ [2.0], [4.0], [6.0] ] )
w = torch.randn( 2, 1, requires_grad=True )
b = torch.randn( 1, 1, requires_grad=True )
def model(x2, w2, b2):
return x2 @ w2 + b2
criterion = torch.nn.MSELoss( size_average=False )
optimizer = torch.optim.SGD([w, b], lr=0.01 )
for epoch in range(10) :
y_pred = model( x_data,w,b )
loss = criterion( y_pred, y_data )
print( epoch, loss.data.item() )
optimizer.zero_grad()
loss.backward()
optimizer.step()
That being said, nn.Linear
is built to simplify this procedure. It automatically creates an equivalent of both w
and b
, called self.weight
and self.bias
, respectively. Also, self.__call__(x)
is equivalent to the definition of forward of your Model
, in that it returns self.weight @ x + self.bias
. In other words, you can also use alternative code
import torch
import torch.nn as nn
x_data = torch.tensor( [ [1.0, 2.0], [2.0, 3.0], [3.0, 4.0] ] )
y_data = torch.tensor( [ [2.0], [4.0], [6.0] ] )
model = nn.Linear(2, 1)
criterion = torch.nn.MSELoss( size_average=False )
optimizer = torch.optim.SGD(model.parameters(), lr=0.01 )
for epoch in range(10) :
y_pred = model(x_data)
loss = criterion( y_pred, y_data )
print( epoch, loss.data.item() )
optimizer.zero_grad()
loss.backward()
optimizer.step()
where model.parameters()
can be used to enumerate model parameters (equivalent to the manually created list [w, b]
above). To access your parameters (load, save, print, whatever) use model.weight
and model.bias
.
Upvotes: 2