Rush Frisby
Rush Frisby

Reputation: 11454

New lines in tab delimited or comma delimtted output

I am looking for some best practices as far as handling csv and tab delimited files.

For CSV files I am already doing some formatting if a value contains a comma or double quote but what if the value contains a new line character? Should I leave the new line intact and encase the value in double quotes + escape any double quotes within the value?

Same question for tab delimited files. I assume the answer would be very similar if not the same.

Upvotes: 7

Views: 14717

Answers (3)

peak
peak

Reputation: 116750

For TSV, if you want lossless representation of values, the "Linear TSV" specification is worth considering: http://paulfitz.github.io/dataprotocols/linear-tsv/index.html

For obvious reasons, most such conventions adhere to the following at a minimum:

   \n for newline,
   \t for tab,
   \r for carriage return,
   \\ for backslash

Some tools add \0 for NUL.

Upvotes: 0

Robert Hui
Robert Hui

Reputation: 744

@Jack is right, that your best bet is to keep the \n unaltered, since you'll expect it inside of double-quotes if that is the case.

As with most things, I think consistency here is key. As far as I know, your values only need to be double-quoted if they span multiple lines, contain commas, or contain double-quotes. In some implementations I've seen, all values are escaped and double-quoted, since it makes the parsing algorithm simpler (there's never a question of escaping and double-quoting, and the reverse on reading the CSV).

This isn't the most space-optimized solution, but makes reading and writing the file a trivial affair, for both your own library and others that may consume it in the future.

Upvotes: 0

Jack
Jack

Reputation: 133577

Usually you keep \n unaltered while exploiting the fact that the newline char will be enclosed in a " " string. This doesn't create ambiguities but it's really ugly if you have to take a look to the file using a normal texteditor.

But it is how you should do since you don't escape anything inside a string in a CSV except for the double quote itself.

Upvotes: 5

Related Questions