Reputation: 566
I have a function that takes in two Tensorflow vectors and a scalar threshold, and returns a Tensorflow operation. The following version throws a "ValueError: No gradients provided for any variable".
def mse(expected, probs, threshold):
preds = tf.to_double(probs >= threshold)
loss_vect = tf.square(expected - preds)
loss = -tf.reduce_mean(loss_vect)
return loss
However if I remove the first line, resulting in the following version of the function, no error is thrown.
def mse(expected, probs, threshold):
loss_vect = tf.square(expected - probs)
loss = -tf.reduce_mean(loss_vect)
return loss
The context in which I call the function is below. The function above is passed in as loss_func. For act_func, I pass in a function that returns a tf.sigmoid operation.
class OneLayerNet(object):
def __init__(self, num_feats, num_outputs, act_func, threshold, loss_func, optimizer, batch_size=8, epochs=100, eta=0.01, reg_const=0):
self.batch_size = batch_size
self.epochs = epochs
self.eta = eta
self.reg_const = reg_const
self.x = tf.sparse_placeholder(tf.float64, name="placeholderx") # num_sents x num_feats
self.y = tf.placeholder(tf.float64, name="placeholdery") # 1 x num_sents
self.w = tf.get_variable("W", shape=[num_feats, num_outputs], initializer=tf.contrib.layers.xavier_initializer(), dtype=tf.float64)
self.b = tf.Variable(tf.zeros([num_outputs], dtype=tf.float64))
self.probs = act_func(self.x, self.w, self.b)
self.loss = loss_func(self.y, self.probs, threshold)
self.optimizer = optimizer(self.eta, self.loss)
self.session = tf.Session()
self.session.run(tf.global_variables_initializer())
From other answers, I understand that the ValueError I'm getting means that the path from my weight vector w and my optimizer is broken. I'm wondering why the path breaks when I add the tf.to_double call.
Upvotes: 1
Views: 163
Reputation: 24591
The problem does not come from to_double
but from the fact that you are thesholding probs
.
When you compute probs >= threshold
, the result is binary. Computing the gradient of this expression w.r.t. probs
does not make much sense because it is 0 almost everywhere, except where it is infinite.
Converting the result to double
will unfortunately not change the situation with respect to that point.
Upvotes: 1