bxshi
bxshi

Reputation: 2292

TensorFlow's Estimator froze with low CPU usage

I updated my TF to v1.0rc1, and Estimator.evaluate does not work anymore because it froze at Restoring model.... I tried to reproduce this problem and the following sample code will make TF froze with a 220% (2CPU) CPU usage and no output at all. Any idea why this happen? Thanks!

import tensorflow as tf
from tensorflow.contrib.layers.python.layers.optimizers import optimize_loss
from tensorflow.contrib.learn.python.learn.estimators import model_fn
from tensorflow.contrib.learn.python.learn.estimators.estimator import Estimator
from tensorflow.python.framework import ops


def main(_):
    def func(features, targets, mode, params):
        idx = tf.concat([features['a'], features['b']], axis=1)

        embedding = tf.get_variable("embed", [10, 20], dtype=tf.float32)

        pred = tf.reduce_sum(tf.nn.embedding_lookup(embedding, idx))

        train_op = optimize_loss(loss=pred,
                                 global_step=tf.train.get_global_step(),
                                 learning_rate=0.001,
                                 optimizer='Adam',
                                 variables=tf.trainable_variables(),
                                 name="training_loss_optimizer")

        eval_metric_dict = dict()
        eval_metric_dict['metric'] = pred

        return model_fn.ModelFnOps(mode=mode,
                                   predictions=pred,
                                   loss=pred,
                                   train_op=train_op,
                                   eval_metric_ops=eval_metric_dict)

    model = Estimator(func, params={})

    model.fit(
        input_fn=lambda: (
            {'a': ops.convert_to_tensor([[1, 2, 3, 4, 5]]), 'b': ops.convert_to_tensor([[2, 3, 4, 3, 5]])},
            None), steps=1)
    model.evaluate(
        input_fn=lambda: (
            {'a': ops.convert_to_tensor([[1, 2, 3, 4, 5]]), 'b': ops.convert_to_tensor([[2, 3, 4, 3, 5]])},
            None))


if __name__ == "__main__":
    tf.app.run()

Upvotes: 1

Views: 281

Answers (1)

Allen Lavoie
Allen Lavoie

Reputation: 5808

By default Estimator.evaluate assumes queue-based input, and will continue evaluating until the input pipeline is exhausted. When there is no queue-based input, this means it will loop forever. The fix is easy: simply provide a steps argument to evaluate.

Upvotes: 1

Related Questions