npCompleteNoob
npCompleteNoob

Reputation: 566

Tensorflow No gradients provided for any variable tf.to_double

I have a function that takes in two Tensorflow vectors and a scalar threshold, and returns a Tensorflow operation. The following version throws a "ValueError: No gradients provided for any variable".

def mse(expected, probs, threshold):
    preds = tf.to_double(probs >= threshold)
    loss_vect = tf.square(expected - preds)
    loss = -tf.reduce_mean(loss_vect)
    return loss

However if I remove the first line, resulting in the following version of the function, no error is thrown.

def mse(expected, probs, threshold):
    loss_vect = tf.square(expected - probs)
    loss = -tf.reduce_mean(loss_vect)
    return loss

The context in which I call the function is below. The function above is passed in as loss_func. For act_func, I pass in a function that returns a tf.sigmoid operation.

class OneLayerNet(object):
    def __init__(self, num_feats, num_outputs, act_func, threshold, loss_func, optimizer, batch_size=8, epochs=100, eta=0.01, reg_const=0):
        self.batch_size = batch_size
        self.epochs = epochs
        self.eta = eta
        self.reg_const = reg_const

        self.x = tf.sparse_placeholder(tf.float64, name="placeholderx") # num_sents x num_feats
        self.y = tf.placeholder(tf.float64, name="placeholdery") # 1 x num_sents
        self.w = tf.get_variable("W", shape=[num_feats, num_outputs], initializer=tf.contrib.layers.xavier_initializer(), dtype=tf.float64)
        self.b = tf.Variable(tf.zeros([num_outputs], dtype=tf.float64))

        self.probs = act_func(self.x, self.w, self.b)
        self.loss = loss_func(self.y, self.probs, threshold)
        self.optimizer = optimizer(self.eta, self.loss)
        self.session = tf.Session()
        self.session.run(tf.global_variables_initializer())

From other answers, I understand that the ValueError I'm getting means that the path from my weight vector w and my optimizer is broken. I'm wondering why the path breaks when I add the tf.to_double call.

Upvotes: 1

Views: 163

Answers (1)

P-Gn
P-Gn

Reputation: 24591

The problem does not come from to_double but from the fact that you are thesholding probs.

When you compute probs >= threshold, the result is binary. Computing the gradient of this expression w.r.t. probs does not make much sense because it is 0 almost everywhere, except where it is infinite.

Converting the result to double will unfortunately not change the situation with respect to that point.

Upvotes: 1

Related Questions