Reputation: 11
I am implementing this paper in Tensorflow CR-CNN. The loss function used in the paper has terms which depend on run time value of Tensors and true labels. Tensorflow as far as I know creates a static computational graph and then executes it in a session. I am finding it hard to implement the prediction and loss function mentioned in this paper, since both of them change dynamically at run time. I tried using tf.cond() in my code but that resulted in 'None' as gradient. Hence my network is not getting trained at all.
class_scores = tf.matmul(pooled, W_classes)
n_correct = tf.Variable(0, trainable=True)
for t in xrange(batch_size):
max_arg = tf.cast(tf.argmax(class_scores[t], 1), tf.int32)
#true_class = tf.constant(0)
true_class = tf.cast(tf.argmax(Y[t], 1), tf.int32)
pred_class = tf.Variable(0,trainable=True)
value = class_scores[t][max_arg]
tf.cond(value <= 0, lambda: tf.assign(pred_class, 0), lambda: tf.assign(pred_class, max_arg + 1))
tf.cond(tf.equal(true_class, pred_class), lambda: tf.add(n_correct, 1), lambda: tf.add(n_correct, 0))
#print(value)
accuracy = tf.cast(n_correct, tf.float32)/tf.cast(batch_size, tf.float32)
Here I am calculating accuracy by counting the no of correct predictions.
Similar approach for loss function as well:
gamma = tf.constant(2.0)
m_plus = tf.constant(2.5)
m_minus = tf.constant(0.5)
batch_loss = tf.Variable(0.0, trainable=True)
for t in xrange(batch_size):
max_arg = tf.cast(tf.argmax(class_scores[t], 1), tf.int32)
true_class = tf.cast(tf.argmax(Y[t], 1), tf.int32)
top2_val, top2_i = tf.nn.top_k(class_scores[t], 2, sorted=True)
pred_class = tf.Variable(0, trainable=True)
true_score = tf.Variable(0.0, trainable=True)
neg_score = tf.Variable(0.0, trainable=True)
value = class_scores[t][max_arg]
tf.cond(value <= 0, lambda: tf.assign(pred_class, 0), lambda: tf.assign(pred_class, max_arg + 1))
tf.cond(tf.equal(true_class, 0), lambda: tf.assign(true_score, 0), lambda: tf.assign(true_score, class_scores[t][true_class-1]))
tf.cond(tf.equal(true_class, 0), lambda: tf.assign(neg_score, value), lambda: tf.cond(tf.equal(true_class, pred_class),
lambda: tf.assign(neg_score, top2_val[1]), lambda: tf.assign(neg_score, value)))
example_loss = tf.Variable(0.0, trainable=True)
tf.cond(tf.equal(true_class, 0), lambda: tf.assign(example_loss, tf.log(1 + tf.exp(tf.multiply(gamma, m_minus + neg_score)))),
lambda: tf.assign(example_loss, tf.log(1 + tf.exp(tf.multiply(gamma, m_plus - true_score))) + tf.log(1 + tf.exp(tf.multiply(gamma, m_minus + neg_score)))))
batch_loss = tf.add(batch_loss, example_loss)
#print(neg_score)
batch_loss = batch_loss/batch_size
train_step = tf.train.GradientDescentOptimizer(learning_rate=lambda_t).minimize(batch_loss)
But the network is not getting trained. Can anyone suggest how to do this in tensorflow?
Upvotes: 1
Views: 2626
Reputation: 5206
There are a few problems with this code, and as-is it just will not work. I recommend you try using tensorflow eager execution as the conceptual problems you have here do not exist there (you don't need tf.cond
or tf.Variable
to solve your problem, for example).
The issue with how this code example is using tf.cond
is that tf.cond
is essentially functional (it adds ops to the graph which only get executed when you use the return value of tf.cond). So your code will need to chain the tf.conds
somehow (probably via tf.control_dependencies
) to make them execute.
However, you also use tf.Variables
during your training example. Tensorflow cannot backprop through assignments to tf.Variable
, so instead you need to replace your calls to tf.assign
and friends with returning the new value of the variable and using it from python.
Upvotes: 1