user458577
user458577

Reputation:

Theano gradient of subtensor

I try to calculate the gradient of a scalar cost towards a weightvector manually by means of a scan operation. Yet that does not work and always returns with an error that a SubTensor is not differentiable

To make sure that the gradient can be calculated:

T.grad(cost, p2) 

works perfectly. That means p2 is not disconnected from the cost. However, when I try the following:

 def differentiate_element(i,p2,c):
            p2element=p2[i]
            return T.grad(c,p2element)
 h2, h2_updates = theano.scan(differentiate_element,
                  sequences=T.arange(p2.shape[0]), non_sequences=[p2, cost])

I get the error 'theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0'

This question has already been asked before: Defining a gradient with respect to a subtensor in Theano but was not answered satisfactory. That is, assigning p2[i] to its own variable as shown does not do the trick.

Adding the option disconnected_inputs='ignore' to the inner loop will actually remove the error but no longer produce a correct output as is shown in the following short example:

import numpy
import theano
import theano.tensor as T
p2=theano.shared(name="P2",value=numpy.zeros(100,dtype=theano.config.floatX),borrow=True)
x=T.scalar('x')
cost=T.sum(x*p2)

gradient=T.grad(cost,p2)

def differentiate_element(i, p2, c):
    p2element = p2[i]
    return T.grad(c, p2element, disconnected_inputs='ignore')

gradient2, grad2_updates = theano.scan(differentiate_element,
                 sequences=T.arange(p2.shape[0]),
                 non_sequences=[p2, cost])

f=theano.function([x],gradient)
g=theano.function([x],gradient2,updates=grad2_updates)

print(f(20))
print(g(20))

The first one, prints an array containing 20's. The second one prints an array containing 0's.

Upvotes: 1

Views: 605

Answers (1)

Makis Tsantekidis
Makis Tsantekidis

Reputation: 2738

Instead of trying to compute a gradient inside a scan function for each subtensor, you should just compute the gradients beforehand and then iterate through the ones you need

p2_grad = T.grad(cost,p2)

def differentiate_element(i,p2):
            p2element=p2[i]
            return p2element

p2elements_grads, h2_updates = theano.scan(differentiate_element,
                  sequences=T.arange(p2_grad.shape[0]), non_sequences=[p2_grad])

edit

Since the main concern in calculating the Hessian diagonal is that you don't want in any case to calculate the whole Hessian and waste computational resource, what you can do to avoid the Disconnected inputs error is add disconnected_inputs='ignore' keyword arguement to T.grad

def differentiate_element(i, p2, c):
    p2element = p2[i]
    return T.grad(c, p2element, disconnected_inputs='ignore')

h2, h2_updates = theano.scan(differentiate_element,
                             sequences=T.arange(p2.shape[0]), non_sequences=[p2, cost])

Upvotes: 1

Related Questions