Reputation: 31
I wanted to write my own activation function, but I got a problem. Saying the matrix multiplication will call .data
. I searched but got little useful information. Any help will be appreciated. The error information is
Traceback (most recent call last):
File "defineAutogradFuncion.py", line 126, in <module>
test = gradcheck(argmin, input, eps=1e-6, atol=1e-4)
File "/home/zhaosl/.local/lib/python2.7/site-packages/torch/autograd/gradcheck.py", line 154, in gradcheck
output = func(*inputs)
File "defineAutogradFuncion.py", line 86, in forward
output = output.mm(dismap).squeeze(-1)
File "/home/zhaosl/.local/lib/python2.7/site-packages/torch/autograd/variable.py", line 578, in mm
output = Variable(self.data.new(self.data.size(0), matrix.data.size(1)))
File "/home/zhaosl/.local/lib/python2.7/site-packages/torch/tensor.py", line 374, in data
raise RuntimeError('cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?')
RuntimeError: cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?
class Softargmin(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
@staticmethod
def forward(self, input):
"""
In the forward pass we receive a Tensor containing the input and return a
Tensor containing the output. You can cache arbitrary Tensors for use in the
backward pass using the save_for_backward method.
"""
#P = Fun.softmax(-input)
inputSqueeze = input.squeeze(-1)
P = Fun.softmax(-inputSqueeze)
self.save_for_backward(P)
output = P.permute(0,2,3,1)
dismap = torch.arange(0,output.size(-1)+1).unsqueeze(1)
output = output.mm(dismap).squeeze(-1)
return output
@staticmethod
def backward(self, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
P, = self.saved_tensors
P = P.unsqueeze(-1)
Pk = torch.squeeze(P,-1).permute(0,2,3,1)
k = torch.arange(0,Pk.size(-1)+1).unsqueeze(1)
sumkPk = Pk.mm(k)
sumkPk = sumkPk.unsqueeze(1).expand(P.size())
i = torch.arange(0,Pk.size(-1)+1).view(1,-1,1,1,1).expand(P.size())
grad_output_expand =grad_output.unsqueeze(-1).unsqueeze(1).expand(P.size())
grad_input = grad_output_expand*P*(sumkPk-i)
return grad_input
Upvotes: 3
Views: 11925
Reputation: 204
Here is an example of simple activation that uses torch activation functions inside but works and can be extended to custom.
import torch as pt
import torch.nn as nn
from torch.nn.modules import Module
# custom activation
class Act(Module):
def forward(self, z):
if(do_ratio > 0):
return nn.functional.dropout(pt.tanh(z), do_ratio)
else:
return pt.tanh(z)
act_fn = Act()
model = pt.nn.Sequential(
pt.nn.Linear(features, n_layer0, bias=enable_bias),
act_fn,
pt.nn.Linear(n_layer0, n_layer1, bias=enable_bias),
act_fn,
pt.nn.Linear(n_layer1, n_layer2, bias=enable_bias)
)
Upvotes: 0
Reputation: 186
The most basic element in PyTorch is a Tensor
, which is the equivalent of numpy.ndarray
with the only difference being that a Tensor
can be put onto a GPU for any computation.
A Variable
is a wrapper around Tensor
that contains three attributes: data
, grad
and grad_fn
. data
contains the original Tensor
; grad
contains the derivative/gradient of some value with respect to this Variable
; and grad_fn
is a pointer to the Function
object that created this Variable
. The grad_fn
attribute is actually the key for autograd
to work properly since PyTorch uses those pointers to build the computation graph at each iteration and carry out the differentiations for all Variables
in your graph accordingly. This is not only about differentiating correctly through this custom Function
object you are creating.
Hence whenever you create some Tensor
in your computation that requires differentiation, wrap it as a Variable
. First, this would enable the Tensor
to be able to save the resulting derivative/gradient value after you call backward()
. Second, this helps autograd
build a correct computation graph.
Another thing to notice is that whenever you send a Variable
into your computation graph, any value that is computed using this Variable
will automatically be a Variable
. So you don't have to manually wrap all Tensors
in your computation graph.
You might want to take a look at this.
Going back to your error, it's a little difficult to figure out what is really causing the trouble because you are not showing all of your code (information like how you are using this custom Function
in your computation graph), but I suspect that what most likely has happened is that you used this Function
in a subgraph that required to be differentiated through, when PyTorch used numerical gradient check on your model to see if the differentiation is correct, it assumed that every node in that subgraph was a Variable
because that is necessary for differentiation through that subgraph to happen, then it tried to call the data
attribute of that Variable
, most likely because that value is used somewhere in the differentiation, and failed because that node was in fact a Tensor
and did not have a data
attribute.
Upvotes: 4
Reputation: 742
The pytorch tensors you are using should be wrapped into a torch.Variable
object like so
v=torch.Variable(mytensor)
The autograd assumes that tensors are wrapped in Variables and then can access the data using v.data
. The Variable
class is the data structure Autograd uses to perform numerical derivatives during the backward pass. Make sure the data tensors you pass are wrapped in torch.Variable
.
-Mo
Upvotes: 0