Difficulties in understanding higher order derivatives for tf.custom_gradient()

Question

Based on the example as quoted in tensorflow's website here: https://www.tensorflow.org/api_docs/python/tf/custom_gradient

@tf.custom_gradient
def op_with_fused_backprop(x):
     y, x_grad = fused_op(x)

     def first_order_gradient(dy):
         @tf.custom_gradient
         def first_order_custom(unused_x):
             def second_order_and_transpose(ddy):
                 return second_order_for_x(...), gradient_wrt_dy(...)
             return x_grad, second_order_and_transpose
         return dy * first_order_custom(x)
     return y, first_order_gradient

There is a lack of details on why second_order_and_transpose(ddy) returns two objects. Based on the documentation of tf.custom_gradient, the grad_fn (i.e. second_order_and_transpose()) should return a list of Tensors which are the derivatives of dy w.r.t. unused_x. It is also not even clear why did they name it unused_x. Anyone has any idea on this example or in general create custom gradients for higher order derivatives?

Xinyao Wang · Accepted Answer

There is a lack of details on why second_order_and_transpose(ddy) returns two objects.

Based on what I played with some examples, I believe you are correct. The official doc is somehow ambiguous (or incorrect). The second_order_and_transpose(ddy) should only return the one object, which is the calculated second-order gradient.

It is also not even clear why did they name it unused_x.

That is the tricky part. The unused_x explains why they name it (because you never going to use it...). The goal here is to wrap your second-order calculation function in a function called first_order_custom. You calculate the gradient of x from fused_op, and use that as a return value, instead of unused_x.

To make this more clear, I passed an example extended from the official document to define a second-order gradient of the log1pexp:

NOTE: The second-order gradient is not numerically stable, so let's use (1 - tf.exp(x)) to represent it, just to make our life easier.

@tf.custom_gradient
def log1pexp2(x):
    e = tf.exp(x)
    y = tf.math.log(1 + e)
    x_grad = 1 - 1 / (1 + e)
    def first_order_gradient(dy):
        @tf.custom_gradient
        def first_order_custom(unused_x):
            def second_order_gradient(ddy):
                # Let's define the second-order graidne to be (1 - e)
                return ddy * (1 - e) 
            return x_grad, second_order_gradient
        return dy * first_order_custom(x)
    return y, first_order_gradient

To test the script, simply run:

import tensorflow as tf

@tf.custom_gradient
def log1pexp2(x):
    e = tf.exp(x)
    y = tf.math.log(1 + e)
    x_grad = 1 - 1 / (1 + e)
    def first_order_gradient(dy):
        @tf.custom_gradient
        def first_order_custom(unused_x):
            def second_order_gradient(ddy):
                # Let's define the second-order graidne to be (1 - e)
                return ddy * (1 - e) 
            return x_grad, second_order_gradient
        return dy * first_order_custom(x)
    return y, first_order_gradient

x1 = tf.constant(1.)
y1 = log1pexp2(x1)
dy1 = tf.gradients(y1, x1)
ddy1 = tf.gradients(dy1, x1)

x2 = tf.constant(100.)
y2 = log1pexp2(x2)
dy2 = tf.gradients(y2, x2)
ddy2 = tf.gradients(dy2, x2)

with tf.Session() as sess:
    print('x=1, dy1:', dy1[0].eval(session=sess))
    print('x=1, ddy1:', ddy1[0].eval(session=sess))
    print('x=100, dy2:', dy2[0].eval(session=sess))
    print('x=100, ddy2:', ddy2[0].eval(session=sess))

Result:

x=1, dy1: 0.7310586
x=1, ddy1: -1.7182817
x=100, dy2: 1.0
x=100, ddy2: -inf

Difficulties in understanding higher order derivatives for tf.custom_gradient()

Answers (1)

Related Questions