No gradients provided in tensorflow (mean_squared_error)

Question

I'm trying to build a simple net of 2 input neurons (+1 bias) going into 1 output neuron to teach it the "and"-function. It's based on the mnist-clissification example, so it might be overly complex for the task, but it's about the general structure of such nets for me, so please don't say "you can just do it in numpy" or something, it's about tensorflow NNs for me. So here is the code:

import tensorflow as tf
import numpy as np

tf.logging.set_verbosity(tf.logging.INFO)

def model_fn(features, labels, mode):

    input_layer = tf.reshape(features["x"], [-1, 2])

    output_layer = tf.layers.dense(inputs=input_layer, units=1, activation=tf.nn.relu, name="output_layer")

    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=output_layer)

    loss = tf.losses.mean_squared_error(labels=labels, predictions=output_layer)

    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
        train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

    eval_metrics_ops = {"accuracy": tf.metrics.accuracy(labels=labels, predictions=output_layer)}
    return tf.estimator.EstimatorSpec(mode=mode, predictions=output_layer, loss=loss)

def main(unused_arg):

    train_data = np.asarray(np.reshape([[0,0],[0,1],[1,0],[1,1]],[4,2]))
    train_labels = np.asarray(np.reshape([0,0,0,1],[4,1]))

    eval_data = train_data
    eval_labels = train_labels

    classifier = tf.estimator.Estimator(model_fn=model_fn, model_dir="/tmp/NN_AND")

    tensors_to_log = {"The output:": "output_layer"}
    logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log,every_n_iter=10)

    train_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x":train_data}, y=train_labels, batch_size=10, num_epochs=None, shuffle=True)
    classifier.train(input_fn=train_input_fn, steps=2000, hooks=[logging_hook])

    eval_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x":eval_data}, y=eval_labels, batch_size=1, shuffle=False)
    eval_results = classifier.evaluate(input_fn=eval_input_fn)
    print(eval_results)

if __name__ == "__main__":
    tf.app.run()

amirbar · Accepted Answer

I've made few slight modifications to your code which enable learning the and function:

1) change your train_data to float32 representation.

train_data = np.asarray(np.reshape([[0,0],[0,1],[1,0],[1,1]],[4,2]), dtype=np.float32)`

2) Remove relu activation from the output layer - generally speaking, using relus in the output layer is not recommended. This might lead to dead relus and all gradients will be equal zero, which in turn will not make any learning possible.

output_layer = tf.layers.dense(inputs=input_layer, units=1, activation=None, name="output_layer")

3) In your eval_metrics_ops make sure you round the result so you can actually measure accuracy:

eval_metrics_ops = {"accuracy": tf.metrics.accuracy(labels=labels, predictions=tf.round(output_layer))}

4) Don't fortget to add the eval_metrics_ops parameter you defined to the estimator:

return tf.estimator.EstimatorSpec(mode=mode, predictions=output_layer, loss=loss, eval_metric_ops=eval_metrics_ops)

In addition, to log the last layer output you should use:

tensors_to_log = {"The output:": "output_layer/BiasAdd:0"}

No gradients provided in tensorflow (mean_squared_error)

Answers (1)

Related Questions