Reputation: 481
Based on the example as quoted in tensorflow's website here: https://www.tensorflow.org/api_docs/python/tf/custom_gradient
@tf.custom_gradient
def op_with_fused_backprop(x):
y, x_grad = fused_op(x)
def first_order_gradient(dy):
@tf.custom_gradient
def first_order_custom(unused_x):
def second_order_and_transpose(ddy):
return second_order_for_x(...), gradient_wrt_dy(...)
return x_grad, second_order_and_transpose
return dy * first_order_custom(x)
return y, first_order_gradient
There is a lack of details on why second_order_and_transpose(ddy)
returns two objects. Based on the documentation of tf.custom_gradient, the grad_fn
(i.e. second_order_and_transpose()
) should return a list of Tensors which are the derivatives of dy w.r.t. unused_x
. It is also not even clear why did they name it unused_x
. Anyone has any idea on this example or in general create custom gradients for higher order derivatives?
Upvotes: 2
Views: 461
Reputation: 2159
There is a lack of details on why second_order_and_transpose(ddy) returns two objects.
Based on what I played with some examples, I believe you are correct. The official doc is somehow ambiguous (or incorrect). The second_order_and_transpose(ddy)
should only return the one object, which is the calculated second-order gradient.
It is also not even clear why did they name it unused_x.
That is the tricky part. The unused_x
explains why they name it (because you never going to use it...). The goal here is to wrap your second-order calculation function in a function called first_order_custom
. You calculate the gradient of x from fused_op
, and use that as a return value, instead of unused_x
.
To make this more clear, I passed an example extended from the official document to define a second-order gradient of the log1pexp
:
NOTE: The second-order gradient is not numerically stable, so let's use (1 - tf.exp(x)) to represent it, just to make our life easier.
@tf.custom_gradient
def log1pexp2(x):
e = tf.exp(x)
y = tf.math.log(1 + e)
x_grad = 1 - 1 / (1 + e)
def first_order_gradient(dy):
@tf.custom_gradient
def first_order_custom(unused_x):
def second_order_gradient(ddy):
# Let's define the second-order graidne to be (1 - e)
return ddy * (1 - e)
return x_grad, second_order_gradient
return dy * first_order_custom(x)
return y, first_order_gradient
To test the script, simply run:
import tensorflow as tf
@tf.custom_gradient
def log1pexp2(x):
e = tf.exp(x)
y = tf.math.log(1 + e)
x_grad = 1 - 1 / (1 + e)
def first_order_gradient(dy):
@tf.custom_gradient
def first_order_custom(unused_x):
def second_order_gradient(ddy):
# Let's define the second-order graidne to be (1 - e)
return ddy * (1 - e)
return x_grad, second_order_gradient
return dy * first_order_custom(x)
return y, first_order_gradient
x1 = tf.constant(1.)
y1 = log1pexp2(x1)
dy1 = tf.gradients(y1, x1)
ddy1 = tf.gradients(dy1, x1)
x2 = tf.constant(100.)
y2 = log1pexp2(x2)
dy2 = tf.gradients(y2, x2)
ddy2 = tf.gradients(dy2, x2)
with tf.Session() as sess:
print('x=1, dy1:', dy1[0].eval(session=sess))
print('x=1, ddy1:', ddy1[0].eval(session=sess))
print('x=100, dy2:', dy2[0].eval(session=sess))
print('x=100, ddy2:', ddy2[0].eval(session=sess))
Result:
x=1, dy1: 0.7310586
x=1, ddy1: -1.7182817
x=100, dy2: 1.0
x=100, ddy2: -inf
Upvotes: 2