Bruno Oliveira
Bruno Oliveira

Reputation: 589

TensorFlow: Reading a CSV file hangs forever

I've followed the tutorial and the csv example shown there doesn't seem to work. It gets stuck forever...

Here's the code:

import tensorflow as tf

filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5 = tf.decode_csv(
    value, record_defaults=record_defaults)
features = tf.pack([col1, col2, col3, col4])

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(1200):
    # Retrieve a single instance:
    example, label = sess.run([features, col5])

  coord.request_stop()
  coord.join(threads)

I'm using Tensorflow 0.7.1 and Python3.

What am I doing wrong?

My files have only this row:

5,4,3,2,1

Upvotes: 4

Views: 1379

Answers (2)

ReaddyEddy
ReaddyEddy

Reputation: 339

It may save some people grief to note in TensorFlow 0.10 the both the amendment from Bruno Oliveira and formatting is required in Python 3.0, not seen this working in Python 2.x even with correct 2.x print syntax.

import tensorflow as tf

filename_queue = tf.train.string_input_producer(["./iris.data"])

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [tf.constant([], dtype=tf.float32),   # Column 0
               tf.constant([], dtype=tf.float32),   # Column 1
               tf.constant([], dtype=tf.float32),   # Column 2
               tf.constant([], dtype=tf.float32),
              tf.constant([], dtype=tf.string)] 
col1, col2, col3, col4, col5 = tf.decode_csv(
    value, record_defaults=record_defaults)
features = tf.pack([col1, col2, col3, col4])

config = tf.ConfigProto(inter_op_parallelism_threads=2)
with tf.Session(config=config) as sess:
# Start populating the filename queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in range(1200):
    # Retrieve a single instance:
        example, label = sess.run([features, col5])
        print(example, label)

   coord.request_stop()
   coord.join(threads)

Upvotes: 0

mrry
mrry

Reputation: 126154

Thanks for your persistence on trying to debug this. It turns out that you were running into a bug that was fixed in a recent commit, but the fix hasn't made it into a release yet. There are two possible fixes (other than acquiring more processors):

  1. Upgrade to the nightly binary release or install from source, to get the fix.
  2. In your Python program, add the following to the session creation:

    config = tf.ConfigProto(inter_op_parallelism_threads=2)
    with tf.Session(config=config) as sess:
      # ...
    

The reason for the issue is that TensorFlow uses a bounded threadpool for dispatching ops, and (until the fix) the reader op could block, which would lead to deadlock if another op had to run before the reader could complete (for example because of a producer-consumer relationship). The fix addresses this by running the reader asynchronously.

Upvotes: 4

Related Questions