What are the inputs to GradientDescentOptimizer?

Question

I'm trying to create a minimal code snippet to understand the GradientDescentOptimizer class to help me to understand the tensorflow API docs in more depth.

I would like to provide some hardcoded inputs to the GradientDescentOptimizer, run the minimize() method and inspect the output. So far, I have created the following:

loss_data = tf.Variable([2.0], dtype=tf.float32)
train_data = tf.Variable([20.0], dtype=tf.float32, name='train')

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.02)
gradients = optimizer.compute_gradients(loss_data, var_list=[train_data])

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(gradients))

The error I get is:

TypeError: Fetch argument None has invalid type

However, I'm just guessing what the inputs could look like. Any pointers appreciated to help me understand what I should be passing to this function.

Some more context ...

I followed a similar process to understand activation functions by isolating them and treating them as a black-box where I send a range of inputs and inspect the corresponding outputs.

# I could have used a list of values but I wanted to experiment 
# by passing in one parameter value at a time.
placeholder = tf.placeholder(dtype=tf.float32, shape=[1], name='placeholder')
activated = tf.nn.sigmoid(placeholder)

with tf.Session() as sess:
    x_y = {}
    for x in range(-10, 10):
        x_y[x] = sess.run(activated, feed_dict={ placeholder: [x/1.0]})

import matplotlib.pyplot as plt
%matplotlib inline

x, y = zip(*x_y.items())
plt.plot(x, y)

The above process was really useful for my understanding of activation functions and I was hoping to do something similar for optimizers.

Engineero · Accepted Answer

See this discussion on GitHub. Basically, tf.compute_gradients returns a gradient of None for all variables not related to its input, and session.run throws this error if it gets a None value. Simple workaround; only tell it about variables that are relevant to the loss function you pass it:

gradients = optimizer.compute_gradients(loss_data, var_list=[loss_data])

Now when you run:

print(sess.run(gradients))
# [(gradient, input)]
# [(array([1.], dtype=float32), array([2.], dtype=float32))]

Which makes sense. You are calculating the gradient of x with respect to x, which is always 1.

More illustrative example

For something more illustrative, let's define a loss op that changes with an input variable:

x = tf.Variable([1], dtype=tf.float32, name='input_x')
loss = tf.abs(x)

Your optimizer is defined the same, but your gradient op is now computed on this loss op:

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.02)
gradients = optimizer.compute_gradients(loss, [x])

Finally, we can run this in a loop with different inputs and see how the gradient changes:

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(-3, 3):
        grad = sess.run(gradients, feed_dict={x: (i,)})
        print(grad)

# [(gradient, input)]
# [(array([-1.], dtype=float32), array([-3.], dtype=float32))]
# [(array([-1.], dtype=float32), array([-2.], dtype=float32))]
# [(array([-1.], dtype=float32), array([-1.], dtype=float32))]
# [(array([0.], dtype=float32), array([0.], dtype=float32))]
# [(array([1.], dtype=float32), array([1.], dtype=float32))]
# [(array([1.], dtype=float32), array([2.], dtype=float32))]

This follows, as loss = -x if x is negative, and loss = +x if x is positive, so d_loss/d_x will be -1 for a negative number, +1 for a positive number, and zero if the input is zero.

What are the inputs to GradientDescentOptimizer?

Answers (2)

More illustrative example

Related Questions