Reputation: 1000
First: I am only a few days in with Tensorflow, so please bear with me.
I started out from the cifar10 tutorial code and I am now using a combination of convolutions and eigenvalue decompositions that break the symbolic differentiation. I.e. the graph gets built, then upon calling train()
the script halts with "No gradient defined for operation [...] (op type: SelfAdjointEig)". No surprise there.
The inputs to the subgraph in question are still only the input feature maps and the filters being used, and I have the formulas for the gradients at hand and they should be straight-forward to implement given the inputs to the subgraph and the gradient with respect to its output.
From what I can see in the docs, I can register a gradient method for custom Ops with RegisterGradient
or override them with the experimental gradient_override_map
.
Both of those should give me access to exactly the things I need. For example, searching on Github I find a lot of examples that access the op's inputs as op.input[0]
or such.
The problem I have is that I want to essentially "shortcut" a whole subgraph, not a single op, so I have no single op to decorate. Since this is happening in one of the convolutional layers of the cifar example I tried using the scope object for that layer. Conceptually, what enters and exits that scope's graph is exactly what I want so if I could somehow override the whole scope's gradients that would "already" do it.
I saw tf.Graph.create_op
which (I think) I could use to register a new type of operation and I could then override that Operation type's gradient computation with aforementioned methods. But I don't see a way of defining that op's forward pass without writing it in C++...
Maybe I am approaching this the wrong way entirely? Since all of my forward or backward operations can be implemented with the python interface I obviously want to avoid implementing anything in C++.
Upvotes: 18
Views: 8163
Reputation: 956
Here is the approach which works for TensorFlow 2.0. Note that in 2.0 we are happy to have 2 different autodiff algorithms: GradientTape
for eager mode and tf.gradient
for the non-eager mode (here called "lazy"). We demonstrate that tf.custom_gradient
works both ways.
import tensorflow as tf
assert tf.version.VERSION.startswith('2.')
import numpy as np
from tensorflow.python.framework.ops import disable_eager_execution, enable_eager_execution
from tensorflow.python.client.session import Session
@tf.custom_gradient
def mysquare(x):
res = x * x
def _grad(dy):
return dy * (2*x)
return res, _grad
def run_eager():
enable_eager_execution()
x = tf.constant(np.array([[1,2,3],[4,5,6]]).astype('float32'))
with tf.GradientTape() as tape:
tape.watch(x)
y = tf.reduce_sum(mysquare(x))
dy_dx = tape.gradient(y,x)
print('Eager mode')
print('x:\n',x.numpy())
print('y:\n',y.numpy())
print('dy_dx:\n',dy_dx.numpy())
def run_lazy():
disable_eager_execution()
x = tf.constant(np.array([[1,2,3],[4,5,6]]).astype('float32'))
y = tf.reduce_sum(mysquare(x))
dy_dx = tf.gradients(y,x)
with Session() as s:
print('Lazy mode')
print('x:\n',x.eval(session=s))
print('y:\n',y.eval(session=s))
assert len(dy_dx)==1
print('dy_dx:\n',dy_dx[0].eval(session=s))
if __name__ == '__main__':
run_eager()
run_lazy()
Upvotes: 0
Reputation: 769
From TensorFlow 1.7 onward, tf.custom_gradient
is the way to go.
Upvotes: 2
Reputation: 11
How about multiply and divide, instead of adding and subtracting t?
t = g(x)
y = tf.stop_gradient(f(x) / t) * t
Upvotes: 0
Reputation: 57893
Here's a trick from Sergey Ioffe:
Suppose you want group of ops that behave as f(x) in forward mode, but as g(x) in the backward mode. You implement it as
t = g(x)
y = t + tf.stop_gradient(f(x) - t)
So in your case your g(x) could be an identity op, with a custom gradient using gradient_override_map
Upvotes: 32