Reputation: 13
I am to understand how tensorflow differentiation behaves when applied on elements of vector. Here is my code
w=tf.Variable([[1.0,2.3],[4.5,6.7]],dtype=tf.float32)
x=tf.constant([[1.2],[2.3]],dtype=tf.float32)
with tf.GradientTape() as tape:
y=tf.matmul(w,x)
d=tape.gradient(y[0],w)
y is 2x1 tensor. When differentiate y[0] w.r.t w i.e a single element of y I get None. d is None here. I can't understand this behavior. y[0] is depended on w ,Why still I'm getting None?
Upvotes: 0
Views: 130
Reputation: 11631
You're getting none because accessing a sub element of a tensor is actually an operation that needs to be done inside the tf.GradientTape
context. Otherwise, the tape does not know the history that lead to that variable.
Your example is akin to doing this:
w=tf.Variable([[1.0,2.3],[4.5,6.7]],dtype=tf.float32)
x=tf.constant([[1.2],[2.3]],dtype=tf.float32)
with tf.GradientTape() as tape:
y=tf.matmul(w,x)
d=tape.gradient(y+2,w)
The tape did not watch the operation y+2
, so the gradient cannot be calculated.
To get the gradient of only one element, you need to explicitly get that element in the tape's context:
w=tf.Variable([[1.0,2.3],[4.5,6.7]],dtype=tf.float32)
x=tf.constant([[1.2],[2.3]],dtype=tf.float32)
with tf.GradientTape() as tape:
y=tf.matmul(w,x)
y0 = y[0]
d=tape.gradient(y0,w)
And then, you get the gradient of y[0]
w.r.t w
:
>>> d
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[1.2, 2.3],
[0. , 0. ]], dtype=float32)>
Upvotes: 1