Ramnath
Ramnath

Reputation: 67

Tensorflow CSV decode error

I am using TensorFlow 0.10.0rc0. I have CUDA Driver Version = 7.5 and CUDNN 4 on Ubuntu 14.04.

I have a simple CSV file which has a single line like this:

"field with
newline",0

where the newline has been added by pressing the enter key in VIM on Ubuntu. I am able to read this file in pandas using the read_csv function, where the text field is shown as containing a single \n character.

But when I try to read it in TensorFlow, I get the following error:

tensorflow.python.framework.errors.InvalidArgumentError: Quoted field has to end with quote followed by delim or end

My tensor flow code to read CSV uses this function to read a single row:

def read_single_example(filename_queue, skip_header_lines, record_defaults, feature_index, label_index):
    reader = tf.TextLineReader(skip_header_lines=skip_header_lines)
    key, value = reader.read(filename_queue)
    record = tf.decode_csv(
        value,
        record_defaults=record_defaults)
    features, label = record[feature_index], record[label_index]
    return features, label

If I read using pandas and replace all newlines with spaces, the TensorFlow code is able to parse the CSV successfully.

But it will be really helpful if newlines can be handled within the TensorFlow CSV pipeline itself.

Upvotes: 1

Views: 744

Answers (2)

rlys
rlys

Reputation: 480

The issue here is that TextLineReader splits the file on new lines, before it is parsed by the csv decoder. With tf.data, you can use tf.contrib.data.CsvDataset, which parses this file correctly according to RFC4180.

Upvotes: 1

Eric Platon
Eric Platon

Reputation: 10122

TensorFlow's CSV reader is pretty strict, in my experience with it, with regards to RFC4180.

Making sure your files use CRLF at the end of each line, as well as in quoted fields, should allow processing.

Note: I have been using this up to 0.9 so far. I did not try on RCs from 0.10.

Upvotes: 0

Related Questions