zhaosl
zhaosl

Reputation: 31

Pytorch: define custom function

I wanted to write my own activation function, but I got a problem. Saying the matrix multiplication will call .data. I searched but got little useful information. Any help will be appreciated. The error information is

 Traceback (most recent call last):
      File "defineAutogradFuncion.py", line 126, in <module>
        test = gradcheck(argmin, input, eps=1e-6, atol=1e-4)
      File "/home/zhaosl/.local/lib/python2.7/site-packages/torch/autograd/gradcheck.py", line 154, in gradcheck
        output = func(*inputs)
      File "defineAutogradFuncion.py", line 86, in forward
        output = output.mm(dismap).squeeze(-1)
      File "/home/zhaosl/.local/lib/python2.7/site-packages/torch/autograd/variable.py", line 578, in mm
        output = Variable(self.data.new(self.data.size(0), matrix.data.size(1)))
      File "/home/zhaosl/.local/lib/python2.7/site-packages/torch/tensor.py", line 374, in data
        raise RuntimeError('cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?')
    RuntimeError: cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?
    class Softargmin(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """
    @staticmethod
    def forward(self, input):
        """
        In the forward pass we receive a Tensor containing the input and return a
        Tensor containing the output. You can cache arbitrary Tensors for use in the
        backward pass using the save_for_backward method.
        """
        #P = Fun.softmax(-input)
        inputSqueeze = input.squeeze(-1)
        P = Fun.softmax(-inputSqueeze)
        self.save_for_backward(P)

        output = P.permute(0,2,3,1)
        dismap = torch.arange(0,output.size(-1)+1).unsqueeze(1)
        output = output.mm(dismap).squeeze(-1)
       return output
    @staticmethod
    def backward(self, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        P, = self.saved_tensors
        P = P.unsqueeze(-1)
        Pk = torch.squeeze(P,-1).permute(0,2,3,1)
        k = torch.arange(0,Pk.size(-1)+1).unsqueeze(1)
        sumkPk = Pk.mm(k)
        sumkPk = sumkPk.unsqueeze(1).expand(P.size())
        i = torch.arange(0,Pk.size(-1)+1).view(1,-1,1,1,1).expand(P.size())
        grad_output_expand =grad_output.unsqueeze(-1).unsqueeze(1).expand(P.size())
        grad_input = grad_output_expand*P*(sumkPk-i)
        return grad_input

Upvotes: 3

Views: 11925

Answers (3)

vlad
vlad

Reputation: 204

Here is an example of simple activation that uses torch activation functions inside but works and can be extended to custom.

import torch as pt
import torch.nn as nn
from torch.nn.modules import Module

# custom activation 
class Act(Module):
    def forward(self, z):
        if(do_ratio > 0):
            return nn.functional.dropout(pt.tanh(z), do_ratio)
        else:
            return pt.tanh(z)

act_fn = Act()
model = pt.nn.Sequential(
        pt.nn.Linear(features, n_layer0, bias=enable_bias),
        act_fn,
        pt.nn.Linear(n_layer0, n_layer1, bias=enable_bias),
        act_fn,
        pt.nn.Linear(n_layer1, n_layer2, bias=enable_bias)        
)

Upvotes: 0

msd15213
msd15213

Reputation: 186

The most basic element in PyTorch is a Tensor, which is the equivalent of numpy.ndarray with the only difference being that a Tensor can be put onto a GPU for any computation.

A Variable is a wrapper around Tensor that contains three attributes: data, grad and grad_fn. data contains the original Tensor; grad contains the derivative/gradient of some value with respect to this Variable; and grad_fn is a pointer to the Function object that created this Variable. The grad_fn attribute is actually the key for autograd to work properly since PyTorch uses those pointers to build the computation graph at each iteration and carry out the differentiations for all Variables in your graph accordingly. This is not only about differentiating correctly through this custom Function object you are creating.

enter image description here

Hence whenever you create some Tensor in your computation that requires differentiation, wrap it as a Variable. First, this would enable the Tensor to be able to save the resulting derivative/gradient value after you call backward(). Second, this helps autograd build a correct computation graph.

Another thing to notice is that whenever you send a Variable into your computation graph, any value that is computed using this Variable will automatically be a Variable. So you don't have to manually wrap all Tensors in your computation graph.

You might want to take a look at this.

Going back to your error, it's a little difficult to figure out what is really causing the trouble because you are not showing all of your code (information like how you are using this custom Function in your computation graph), but I suspect that what most likely has happened is that you used this Function in a subgraph that required to be differentiated through, when PyTorch used numerical gradient check on your model to see if the differentiation is correct, it assumed that every node in that subgraph was a Variable because that is necessary for differentiation through that subgraph to happen, then it tried to call the data attribute of that Variable, most likely because that value is used somewhere in the differentiation, and failed because that node was in fact a Tensor and did not have a data attribute.

Upvotes: 4

Mo Hossny
Mo Hossny

Reputation: 742

The pytorch tensors you are using should be wrapped into a torch.Variable object like so

v=torch.Variable(mytensor)

The autograd assumes that tensors are wrapped in Variables and then can access the data using v.data. The Variable class is the data structure Autograd uses to perform numerical derivatives during the backward pass. Make sure the data tensors you pass are wrapped in torch.Variable.

-Mo

Upvotes: 0

Related Questions