S.Richmond
S.Richmond

Reputation: 11558

RAILS 3 CSV "Illegal quoting" is a lie

I've hit a problem during parsing of a CSV file where I get the following error:

CSV::MalformedCSVError: Illegal quoting on line 3.

RAILS code in question:

csv = CSV.read(args.local_file_path, col_sep: "\t", headers: true)

Line 3 in the CSV file is:

A-067067        VO  VIA CE  0   8   8   SWCH            Ter 4, Loc Is Here, Mne,    Per Fl                                  Auia/Sey    IMAC            NEK_HW      2011-03-09 09:47:44 2011-03-09 11:50:26 2011-01-13 10:49:17 2011-02-14 14:02:43 2011-02-14 14:02:44 0   0   771 771 46273   "[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC"   SOME_TEXT       SOME_TEXT       N/A Name Here                               RESOLVED_CLOSED RESOLVED_CLOSED

UPDATE: Tabs don't appear to have come across above. See pastebin RAW TEXT: http://pastebin.com/4gj7iUpP

I've read numerous threads all over StackOverflow and Google about why this is and I understand that. But the CSV row above has perfectly legal quoting does it not? The CSV is tab delimited and there is only a tab followed by the quote on either side of the column in question. There is 1 quote in that field and it is double quoted to escape it. So what gives? I can't work it out. :(

Assuming I've got something wrong here, I'd like the solution to include a way to work around the issue as I don't have control over how the CSV is constructed.

Upvotes: 1

Views: 2481

Answers (1)

mu is too short
mu is too short

Reputation: 434685

This part of your CSV is at fault:

46273   "[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC"   SOME_TEXT

At least one of these parts has a stray space:

46273   "
"   SOME_TEXT

I'd guess that the "3" and the double are supposed to be separated by one or more tabs but there is a space before the quote. Or, there is a space after the quote on the other end when there are only supposed to be tabs between the closing quote and the "S".

CSV escapes double quotes by double them so this:

"[O/H 15/02] B270 W31 ""TEXT TEXT 2 X TEXT SWITC"

is supposed to be a single filed that contains an embedded quote:

[O/H 15/02] B270 W31 "TEXT TEXT 2 X TEXT SWITC

If you have a space before the first quote or after the last quote then, since your fields are tab delimited, you have an unescaped double quote inside a field and that's where your "illegal quoting" error comes from.

Try sending your CSV file through cat -t (which should represent tabs as ^I) to find where the stray space is.

Upvotes: 1

Related Questions