Reputation: 589
I've followed the tutorial and the csv example shown there doesn't seem to work. It gets stuck forever...
Here's the code:
import tensorflow as tf
filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5 = tf.decode_csv(
value, record_defaults=record_defaults)
features = tf.pack([col1, col2, col3, col4])
with tf.Session() as sess:
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(1200):
# Retrieve a single instance:
example, label = sess.run([features, col5])
coord.request_stop()
coord.join(threads)
I'm using Tensorflow 0.7.1 and Python3.
What am I doing wrong?
My files have only this row:
5,4,3,2,1
Upvotes: 4
Views: 1379
Reputation: 339
It may save some people grief to note in TensorFlow 0.10 the both the amendment from Bruno Oliveira and formatting is required in Python 3.0, not seen this working in Python 2.x even with correct 2.x print syntax.
import tensorflow as tf
filename_queue = tf.train.string_input_producer(["./iris.data"])
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [tf.constant([], dtype=tf.float32), # Column 0
tf.constant([], dtype=tf.float32), # Column 1
tf.constant([], dtype=tf.float32), # Column 2
tf.constant([], dtype=tf.float32),
tf.constant([], dtype=tf.string)]
col1, col2, col3, col4, col5 = tf.decode_csv(
value, record_defaults=record_defaults)
features = tf.pack([col1, col2, col3, col4])
config = tf.ConfigProto(inter_op_parallelism_threads=2)
with tf.Session(config=config) as sess:
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(1200):
# Retrieve a single instance:
example, label = sess.run([features, col5])
print(example, label)
coord.request_stop()
coord.join(threads)
Upvotes: 0
Reputation: 126154
Thanks for your persistence on trying to debug this. It turns out that you were running into a bug that was fixed in a recent commit, but the fix hasn't made it into a release yet. There are two possible fixes (other than acquiring more processors):
In your Python program, add the following to the session creation:
config = tf.ConfigProto(inter_op_parallelism_threads=2)
with tf.Session(config=config) as sess:
# ...
The reason for the issue is that TensorFlow uses a bounded threadpool for dispatching ops, and (until the fix) the reader op could block, which would lead to deadlock if another op had to run before the reader could complete (for example because of a producer-consumer relationship). The fix addresses this by running the reader asynchronously.
Upvotes: 4