Reputation: 86640
I'm looking at this implementation of SGD for PyTorch: https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD
And I see some strange calculations which I don't understand.
For instance, take a look at p.data.add_(-group['lr'], d_p)
. It makes sense to think that there is a multiplication of the two parameters, right? (It's how SGD works, -lr * grads
)
But the documentation of the function doesn't say anything about this.
And what is more confusing, although this SGD code actually works (I tested by copying the code and calling prints below the add_
), I can't simply use add_
with two arguments as it does:
#this returns an error about using too many arguments
import torch
a = torch.tensor([1,2,3])
b = torch.tensor([6,10,15])
c = torch.tensor([100,100,100])
a.add_(b, c)
print(a)
What's going on here? What am I missing?
Upvotes: 5
Views: 3262
Reputation: 812
This works for scalars:
a = t.tensor(1)
b = t.tensor(2)
c = t.tensor(3)
a.add_(b, c)
print(a)
tensor(7)
Or a
can be a tensor:
a = t.tensor([[1,1],[1,1]])
b = t.tensor(2)
c = t.tensor(3)
a.add_(b, c)
print(a)
tensor([[7, 7], [7, 7]])
Output is 7, because: (Tensor other, Number alpha)
Upvotes: 3