sthomps
sthomps

Reputation: 4850

tf.decode_csv() error: "Unquoted fields cannot have quotes/CRLFs inside"

I have a csv file blah.txt that looks like:

1,2
3,4

I can read the csv as follows:

import tensorflow as tf
sess = tf.InteractiveSession()
csv_train= tf.read_file('blah.txt')
csv_train.eval()

Which outputs:

Out[5]: '1,2\n3,4'

I'm trying to decode the csv as follows:

col1,col2 = tf.decode_csv(csv_train,
                          record_defaults=[tf.constant([],dtype=tf.int32),
                                           tf.constant([],dtype=tf.int32)])

Now when I run col1.eval() I get the error:

W tensorflow/core/common_runtime/executor.cc:1102] 0x7ff203f17240 Compute status: Invalid argument: Unquoted fields cannot have quotes/CRLFs inside
     [[Node: DecodeCSV_6 = DecodeCSV[OUT_TYPE=[DT_INT32, DT_INT32], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](ReadFile, Const_12, Const_13)]]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-dc904e64a78b>", line 1, in <module>
    col1.eval()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 465, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3097, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 315, in run
    return self._run(None, fetches, feed_dict)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 511, in _run
    feed_dict_string)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _do_run
    target_list)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 586, in _do_call
    e.code)
InvalidArgumentError: Unquoted fields cannot have quotes/CRLFs inside
     [[Node: DecodeCSV_6 = DecodeCSV[OUT_TYPE=[DT_INT32, DT_INT32], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](ReadFile, Const_12, Const_13)]]
Caused by op u'DecodeCSV_6', defined at:
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 488, in <module>
    pydevconsole.StartServer(pydev_localhost.get_localhost(), int(port), int(client_port))
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 334, in StartServer
    process_exec_queue(interpreter)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 209, in process_exec_queue
    more = interpreter.addExec(code_fragment)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_console_utils.py", line 201, in addExec
    more = self.doAddExec(code_fragment)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_ipython_console.py", line 42, in doAddExec
    res = bool(self.interpreter.addExec(codeFragment.text))
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_ipython_console_011.py", line 435, in addExec
    self.ipython.run_cell(line, store_history=True)
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2902, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3006, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-16-0556330e1530>", line 3, in <module>
    tf.constant([],dtype=tf.int32)])
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_parsing_ops.py", line 38, in decode_csv
    field_delim=field_delim, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2040, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1087, in __init__
    self._traceback = _extract_stack()

How can I decode this csv?

Upvotes: 4

Views: 4789

Answers (1)

mrry
mrry

Reputation: 126194

The tf.read_file() op reads the entire contents of the given file into a single string, whereas tf.decode_csv() op expects each element of its input to be a single record (i.e. one line). Therefore you need something that reads one line at a time, which the tf.TextLineReader supports.

Using a reader is slightly more complicated than using a simple op, because it's designed for reading large multi-file datasets with a lot of flexibility in how the files are chosen. You can see the tutorial for a complete explanation, but the following example code should help to get you started:

# Read the file once.
filenames = tf.train.string_input_producer(["blah.txt"], num_epochs=1)

reader = tf.TextLineReader()

_, line = reader.read(filenames)

col1, col2 = tf.decode_csv(line,
                           record_defaults=[tf.constant([],dtype=tf.int32),
                                            tf.constant([],dtype=tf.int32)])

Now col1 and col2 represent single values. If you evaluate them, you'll get the contents of the next line:

# N.B. These must be called before evaluating the inputs.
sess.run(tf.initialize_all_variables()
tf.train.start_queue_runners(sess)

print sess.run([col1, col2])  # ==> 1, 2
print sess.run([col1, col2])  # ==> 3, 4

If instead you want to batch the columns, you can use tf.train.batch():

_, line = reader.read(filenames)

# N.B. Batch size is 2, to match the size of your file.
line_batch, = tf.train.batch([line], batch_size=2)

col1, col2 = tf.decode_csv(line_batch,
                           record_defaults=[tf.constant([],dtype=tf.int32),
                                            tf.constant([],dtype=tf.int32)])

sess.run(tf.initialize_all_variables()
tf.train.start_queue_runners(sess)

print sess.run([col1, col2])  # ==> [1, 3], [2, 4]

Upvotes: 4

Related Questions