tensorflow differentiate only element of vector

Question

I am to understand how tensorflow differentiation behaves when applied on elements of vector. Here is my code

w=tf.Variable([[1.0,2.3],[4.5,6.7]],dtype=tf.float32)
x=tf.constant([[1.2],[2.3]],dtype=tf.float32)
with tf.GradientTape() as tape:
    y=tf.matmul(w,x)
d=tape.gradient(y[0],w)

y is 2x1 tensor. When differentiate y[0] w.r.t w i.e a single element of y I get None. d is None here. I can't understand this behavior. y[0] is depended on w ,Why still I'm getting None?

Lescurel · Accepted Answer

You're getting none because accessing a sub element of a tensor is actually an operation that needs to be done inside the tf.GradientTape context. Otherwise, the tape does not know the history that lead to that variable.

Your example is akin to doing this:

w=tf.Variable([[1.0,2.3],[4.5,6.7]],dtype=tf.float32)
x=tf.constant([[1.2],[2.3]],dtype=tf.float32)
with tf.GradientTape() as tape:
    y=tf.matmul(w,x)
d=tape.gradient(y+2,w)

The tape did not watch the operation y+2, so the gradient cannot be calculated.

To get the gradient of only one element, you need to explicitly get that element in the tape's context:

w=tf.Variable([[1.0,2.3],[4.5,6.7]],dtype=tf.float32)
x=tf.constant([[1.2],[2.3]],dtype=tf.float32)
with tf.GradientTape() as tape:
    y=tf.matmul(w,x)
    y0 = y[0]
d=tape.gradient(y0,w)

And then, you get the gradient of y[0] w.r.t w:

>>> d

tensorflow differentiate only element of vector

Answers (1)

Related Questions