Reputation: 1000

How Can I Define Only the Gradient for a Tensorflow Subgraph?

First: I am only a few days in with Tensorflow, so please bear with me.

I started out from the cifar10 tutorial code and I am now using a combination of convolutions and eigenvalue decompositions that break the symbolic differentiation. I.e. the graph gets built, then upon calling train() the script halts with "No gradient defined for operation [...] (op type: SelfAdjointEig)". No surprise there.

The inputs to the subgraph in question are still only the input feature maps and the filters being used, and I have the formulas for the gradients at hand and they should be straight-forward to implement given the inputs to the subgraph and the gradient with respect to its output.

From what I can see in the docs, I can register a gradient method for custom Ops with RegisterGradient or override them with the experimental gradient_override_map. Both of those should give me access to exactly the things I need. For example, searching on Github I find a lot of examples that access the op's inputs as op.input[0] or such.

The problem I have is that I want to essentially "shortcut" a whole subgraph, not a single op, so I have no single op to decorate. Since this is happening in one of the convolutional layers of the cifar example I tried using the scope object for that layer. Conceptually, what enters and exits that scope's graph is exactly what I want so if I could somehow override the whole scope's gradients that would "already" do it.

I saw tf.Graph.create_op which (I think) I could use to register a new type of operation and I could then override that Operation type's gradient computation with aforementioned methods. But I don't see a way of defining that op's forward pass without writing it in C++...

Maybe I am approaching this the wrong way entirely? Since all of my forward or backward operations can be implemented with the python interface I obviously want to avoid implementing anything in C++.

Upvotes: 18

Answers (4)

Grwlf

Reputation: 956

Here is the approach which works for TensorFlow 2.0. Note that in 2.0 we are happy to have 2 different autodiff algorithms: GradientTape for eager mode and tf.gradient for the non-eager mode (here called "lazy"). We demonstrate that tf.custom_gradient works both ways.

import tensorflow as tf
assert tf.version.VERSION.startswith('2.')
import numpy as np
from tensorflow.python.framework.ops import disable_eager_execution, enable_eager_execution
from tensorflow.python.client.session import Session

@tf.custom_gradient
def mysquare(x):
  res = x * x
  def _grad(dy):
    return dy * (2*x)
  return res, _grad

def run_eager():
  enable_eager_execution()

  x = tf.constant(np.array([[1,2,3],[4,5,6]]).astype('float32'))
  with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.reduce_sum(mysquare(x))

    dy_dx = tape.gradient(y,x)
    print('Eager mode')
    print('x:\n',x.numpy())
    print('y:\n',y.numpy())
    print('dy_dx:\n',dy_dx.numpy())


def run_lazy():
  disable_eager_execution()

  x = tf.constant(np.array([[1,2,3],[4,5,6]]).astype('float32'))
  y = tf.reduce_sum(mysquare(x))
  dy_dx = tf.gradients(y,x)

  with Session() as s:
    print('Lazy mode')
    print('x:\n',x.eval(session=s))
    print('y:\n',y.eval(session=s))
    assert len(dy_dx)==1
    print('dy_dx:\n',dy_dx[0].eval(session=s))

if __name__ == '__main__':
  run_eager()
  run_lazy()

Upvotes: 0

Stephane Bersier

Reputation: 769

From TensorFlow 1.7 onward, tf.custom_gradient is the way to go.

Upvotes: 2

Yoh Okuno

Reputation: 11

How about multiply and divide, instead of adding and subtracting t?

t = g(x)
y = tf.stop_gradient(f(x) / t) * t

Upvotes: 0

Yaroslav Bulatov

Reputation: 57893

Here's a trick from Sergey Ioffe:

Suppose you want group of ops that behave as f(x) in forward mode, but as g(x) in the backward mode. You implement it as

t = g(x)
y = t + tf.stop_gradient(f(x) - t)

So in your case your g(x) could be an identity op, with a custom gradient using gradient_override_map

Upvotes: 32

How Can I Define Only the Gradient for a Tensorflow Subgraph?

Answers (4)

Related Questions