Reputation:
I try to calculate the gradient of a scalar cost towards a weightvector manually by means of a scan operation. Yet that does not work and always returns with an error that a SubTensor is not differentiable
To make sure that the gradient can be calculated:
T.grad(cost, p2)
works perfectly. That means p2 is not disconnected from the cost. However, when I try the following:
def differentiate_element(i,p2,c):
p2element=p2[i]
return T.grad(c,p2element)
h2, h2_updates = theano.scan(differentiate_element,
sequences=T.arange(p2.shape[0]), non_sequences=[p2, cost])
I get the error 'theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: Subtensor{int64}.0'
This question has already been asked before: Defining a gradient with respect to a subtensor in Theano but was not answered satisfactory. That is, assigning p2[i] to its own variable as shown does not do the trick.
Adding the option disconnected_inputs='ignore' to the inner loop will actually remove the error but no longer produce a correct output as is shown in the following short example:
import numpy
import theano
import theano.tensor as T
p2=theano.shared(name="P2",value=numpy.zeros(100,dtype=theano.config.floatX),borrow=True)
x=T.scalar('x')
cost=T.sum(x*p2)
gradient=T.grad(cost,p2)
def differentiate_element(i, p2, c):
p2element = p2[i]
return T.grad(c, p2element, disconnected_inputs='ignore')
gradient2, grad2_updates = theano.scan(differentiate_element,
sequences=T.arange(p2.shape[0]),
non_sequences=[p2, cost])
f=theano.function([x],gradient)
g=theano.function([x],gradient2,updates=grad2_updates)
print(f(20))
print(g(20))
The first one, prints an array containing 20's. The second one prints an array containing 0's.
Upvotes: 1
Views: 605
Reputation: 2738
Instead of trying to compute a gradient inside a scan function for each subtensor, you should just compute the gradients beforehand and then iterate through the ones you need
p2_grad = T.grad(cost,p2)
def differentiate_element(i,p2):
p2element=p2[i]
return p2element
p2elements_grads, h2_updates = theano.scan(differentiate_element,
sequences=T.arange(p2_grad.shape[0]), non_sequences=[p2_grad])
edit
Since the main concern in calculating the Hessian diagonal is that you don't want in any case to calculate the whole Hessian and waste computational resource, what you can do to avoid the Disconnected inputs error is add disconnected_inputs='ignore'
keyword arguement to T.grad
def differentiate_element(i, p2, c):
p2element = p2[i]
return T.grad(c, p2element, disconnected_inputs='ignore')
h2, h2_updates = theano.scan(differentiate_element,
sequences=T.arange(p2.shape[0]), non_sequences=[p2, cost])
Upvotes: 1