Tensorflow No gradients provided for any variable tf.to_double

Question

I have a function that takes in two Tensorflow vectors and a scalar threshold, and returns a Tensorflow operation. The following version throws a "ValueError: No gradients provided for any variable".

def mse(expected, probs, threshold):
    preds = tf.to_double(probs >= threshold)
    loss_vect = tf.square(expected - preds)
    loss = -tf.reduce_mean(loss_vect)
    return loss

However if I remove the first line, resulting in the following version of the function, no error is thrown.

def mse(expected, probs, threshold):
    loss_vect = tf.square(expected - probs)
    loss = -tf.reduce_mean(loss_vect)
    return loss

The context in which I call the function is below. The function above is passed in as loss_func. For act_func, I pass in a function that returns a tf.sigmoid operation.

class OneLayerNet(object):
    def __init__(self, num_feats, num_outputs, act_func, threshold, loss_func, optimizer, batch_size=8, epochs=100, eta=0.01, reg_const=0):
        self.batch_size = batch_size
        self.epochs = epochs
        self.eta = eta
        self.reg_const = reg_const

        self.x = tf.sparse_placeholder(tf.float64, name="placeholderx") # num_sents x num_feats
        self.y = tf.placeholder(tf.float64, name="placeholdery") # 1 x num_sents
        self.w = tf.get_variable("W", shape=[num_feats, num_outputs], initializer=tf.contrib.layers.xavier_initializer(), dtype=tf.float64)
        self.b = tf.Variable(tf.zeros([num_outputs], dtype=tf.float64))

        self.probs = act_func(self.x, self.w, self.b)
        self.loss = loss_func(self.y, self.probs, threshold)
        self.optimizer = optimizer(self.eta, self.loss)
        self.session = tf.Session()
        self.session.run(tf.global_variables_initializer())

From other answers, I understand that the ValueError I'm getting means that the path from my weight vector w and my optimizer is broken. I'm wondering why the path breaks when I add the tf.to_double call.

P-Gn · Accepted Answer

The problem does not come from to_double but from the fact that you are thesholding probs.

When you compute probs >= threshold, the result is binary. Computing the gradient of this expression w.r.t. probs does not make much sense because it is 0 almost everywhere, except where it is infinite.

Converting the result to double will unfortunately not change the situation with respect to that point.

Tensorflow No gradients provided for any variable tf.to_double

Answers (1)

Related Questions