Tom Melly
Tom Melly

Reputation: 373

CSV specification - double quotes at the start and end of fields

Question (because I can't work it out), should ""hello world"" be a valid field value in a CSV file according to the specification?

i.e should:

1,""hello world"",9.5

be a valid CSV record?

(If so, then the Perl CSV-XS parser I'm using is mildly broken, but if not, then $line =~ s/\342\200\234/""/g; is a really bad idea ;) )

The weird thing is is that this code has been running without issue for years, but we've only just hit a record that started with both a left double quote and contained no comma (the above is from a CSV pre-parser).

Upvotes: 1

Views: 1404

Answers (2)

oᴉɹǝɥɔ
oᴉɹǝɥɔ

Reputation: 2055

That depends on the escape character you use. If your escape character is '"' (double quote) then your line should look like

1,"""hello world""",9.5

If your escape character is '\' (backslash) then your line should look like

1,"\"hello world\"",9.5

Check your parser/environment defaults or explicitly configure your parser with the escape character you need e.g. to use backslash do:

my $csv = Text::CSV_XS->new ({ quote_char => '"', escape_char => "\\" });

Upvotes: 0

Patrick Mevzek
Patrick Mevzek

Reputation: 12505

The canonical format definition of CSV is https://www.rfc-editor.org/rfc/rfc4180.txt. It says:

  1. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:

    "aaa","bbb","ccc" CRLF
    zzz,yyy,xxx

  2. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF
    bb","ccc" CRLF
    zzz,yyy,xxx

  3. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

Last rule means your line should have been:

1,"""hello world""",9.5

But not all parsers/generators follow this standard perfectly, so you might need for interoperability reasons to relax some rules. It all depends on how much you control the CSV format writing and CSV format parsing parts.

Upvotes: 5

Related Questions