Reputation: 4850
I have a csv file blah.txt
that looks like:
1,2
3,4
I can read the csv as follows:
import tensorflow as tf
sess = tf.InteractiveSession()
csv_train= tf.read_file('blah.txt')
csv_train.eval()
Which outputs:
Out[5]: '1,2\n3,4'
I'm trying to decode the csv as follows:
col1,col2 = tf.decode_csv(csv_train,
record_defaults=[tf.constant([],dtype=tf.int32),
tf.constant([],dtype=tf.int32)])
Now when I run col1.eval()
I get the error:
W tensorflow/core/common_runtime/executor.cc:1102] 0x7ff203f17240 Compute status: Invalid argument: Unquoted fields cannot have quotes/CRLFs inside
[[Node: DecodeCSV_6 = DecodeCSV[OUT_TYPE=[DT_INT32, DT_INT32], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](ReadFile, Const_12, Const_13)]]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-17-dc904e64a78b>", line 1, in <module>
col1.eval()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 465, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3097, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 315, in run
return self._run(None, fetches, feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 511, in _run
feed_dict_string)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _do_run
target_list)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 586, in _do_call
e.code)
InvalidArgumentError: Unquoted fields cannot have quotes/CRLFs inside
[[Node: DecodeCSV_6 = DecodeCSV[OUT_TYPE=[DT_INT32, DT_INT32], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](ReadFile, Const_12, Const_13)]]
Caused by op u'DecodeCSV_6', defined at:
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 488, in <module>
pydevconsole.StartServer(pydev_localhost.get_localhost(), int(port), int(client_port))
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 334, in StartServer
process_exec_queue(interpreter)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 209, in process_exec_queue
more = interpreter.addExec(code_fragment)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_console_utils.py", line 201, in addExec
more = self.doAddExec(code_fragment)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_ipython_console.py", line 42, in doAddExec
res = bool(self.interpreter.addExec(codeFragment.text))
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_ipython_console_011.py", line 435, in addExec
self.ipython.run_cell(line, store_history=True)
File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2902, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3006, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-16-0556330e1530>", line 3, in <module>
tf.constant([],dtype=tf.int32)])
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_parsing_ops.py", line 38, in decode_csv
field_delim=field_delim, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2040, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1087, in __init__
self._traceback = _extract_stack()
How can I decode this csv?
Upvotes: 4
Views: 4789
Reputation: 126194
The tf.read_file()
op reads the entire contents of the given file into a single string, whereas tf.decode_csv()
op expects each element of its input to be a single record (i.e. one line). Therefore you need something that reads one line at a time, which the tf.TextLineReader
supports.
Using a reader is slightly more complicated than using a simple op, because it's designed for reading large multi-file datasets with a lot of flexibility in how the files are chosen. You can see the tutorial for a complete explanation, but the following example code should help to get you started:
# Read the file once.
filenames = tf.train.string_input_producer(["blah.txt"], num_epochs=1)
reader = tf.TextLineReader()
_, line = reader.read(filenames)
col1, col2 = tf.decode_csv(line,
record_defaults=[tf.constant([],dtype=tf.int32),
tf.constant([],dtype=tf.int32)])
Now col1
and col2
represent single values. If you evaluate them, you'll get the contents of the next line:
# N.B. These must be called before evaluating the inputs.
sess.run(tf.initialize_all_variables()
tf.train.start_queue_runners(sess)
print sess.run([col1, col2]) # ==> 1, 2
print sess.run([col1, col2]) # ==> 3, 4
If instead you want to batch the columns, you can use tf.train.batch()
:
_, line = reader.read(filenames)
# N.B. Batch size is 2, to match the size of your file.
line_batch, = tf.train.batch([line], batch_size=2)
col1, col2 = tf.decode_csv(line_batch,
record_defaults=[tf.constant([],dtype=tf.int32),
tf.constant([],dtype=tf.int32)])
sess.run(tf.initialize_all_variables()
tf.train.start_queue_runners(sess)
print sess.run([col1, col2]) # ==> [1, 3], [2, 4]
Upvotes: 4