Tensorflow: troubleshoot tf.estimator.inputs.numpy_input_fn function

Question

I'm running some tutorial code from text classification

I can run the scripts and it worked but when I tried to run it line by line trying to understand what each step is doing, I got a bit confused at this step:

test_input_fn = tf.estimator.inputs.numpy_input_fn(
  x={WORDS_FEATURE: x_test},
  y=y_test,
  num_epochs=1,
  shuffle=False)
classifier.train(input_fn=train_input_fn, steps=100)

I know conceptually train_input_fn is feeding data to the training function but how I can manually call this fn to inspect what's in it?

I've traced the code and found out the train_input_fn function feeds data to the following 2 variables:

features
Out[15]: {'words': }

labels
Out[16]:

When I tried to evaluate the features variable by doing a sess.run(features), my terminal seems to get stuck and stops responding.

What's the right way to inspect content of variables like these?

Thank you!

DomJack · Accepted Answer

Based on the numpy_input_fn documentation and the behaviour (hanging) I imagine the underlying implementation depends on a queue runner. Hanging occurs when queue runners aren't started. Try modifying your session running script to something like the following, based on this guide:

with tf.Session() as sess:
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    try:
        for step in xrange(1000000):
            if coord.should_stop():
                break
            features_data = sess.run(features)
            print(features_data)

    except Exception, e:
        # Report exceptions to the coordinator.
        coord.request_stop(e)
    finally:
        # Terminate as usual. It is safe to call `coord.request_stop()` twice.
        coord.request_stop()
        coord.join(threads)

Alternatively, I'd encourage you to check out the tf.data.Dataset interface (possible tf.contrib.data.Dataset in tensorflow 1.3 or prior). You can get similar input/labels tensors without using queues with Dataset.from_tensor_slices. Creation is slightly more involved, but the interface is much more flexible and the implementation doesn't use queue runners, meaning session running is much simpler.

import tensorflow as tf
import numpy as np

x_data = np.random.random((100000, 2))
y_data = np.random.random((100000,))

batch_size = 2
buff = 100


def input_fn():
    # possible tf.contrib.data.Dataset.from... in tf 1.3 or earlier
    dataset = tf.data.Dataset.from_tensor_slices((x_data, y_data))
    dataset = dataset.repeat().shuffle(buff).batch(batch_size)
    x, y = dataset.make_one_shot_iterator().get_next()
    return x, y


x, y = input_fn()
with tf.Session() as sess:
    print(sess.run([x, y]))

Tensorflow: troubleshoot tf.estimator.inputs.numpy_input_fn function

Answers (1)

Related Questions