Reputation: 21
I keep thinking that I am about to understand custom gradients but then I test it out this example and I can just not figure out what is going on. I am hoping somebody can walk me through what exactly is happening below. I think this essentially is down to me not understanding specifically what "dy" is in the backward function.
v = tf.Variable(2.0)
with tf.GradientTape() as t:
x = v*v
output = x**2
print(t.gradient(output, v))
**tf.Tensor(32.0, shape=(), dtype=float32)**
Everything is good here and the gradient is as one would expect. I then test out this example using custom gradients which (given my understanding) could not possibly affect the gradient given I have put in this massive threshold in clip_by_norm
@tf.custom_gradient
def clip_gradients2(y):
def backward(dy):
return tf.clip_by_norm(dy, 20000000000000000000000000)
return y**2, backward
v = tf.Variable(2.0)
with tf.GradientTape() as t:
x=v*v
output = clip_gradients2(x)
print(t.gradient(output, v))
tf.Tensor(4.0, shape=(), dtype=float32)
But it is reduced to 4, so this is somehow having an effect. How exactly is this resulting in a smaller gradient?
Upvotes: 2
Views: 134
Reputation: 11651
When writing a custom gradient, you must define the whole derivative calculation by yourself. Without your custom gradient, we have the following derivative:
((x**2)**2)dx = (x**4)dx = 4*(x**3) = 32 when x=2
When you override your gradient calculation, you only have
(x**2)dx = 2x = 4 when x=2
You need to calculate the derivative in your function, i.e:
@tf.custom_gradient
def clip_gradients2(y):
def backward(dy):
dy = dy * (2*y)
return tf.clip_by_norm(dy, 20000000000000000000000000)
return y**2, backward
To get the desired behavior.
Upvotes: 2