Are there any alternatives to the Python csv module that can quote newlines and use arbitrary characters as newlines for reading?

Question

Python's csv module is hard-coded (in C) to immediately recognize a carriage return/line feed as an end of a row when using the reader.

In many cases, I've had to write a script (in Python, because tools like sed do not treat newlines/carriage returns normally) to replace in-cell newlines byte-by-byte with an unused text character (vertical tab) and then replace that when parsing it again with a csv reader.

There are two primary cases I have had to deal with:

The true "end-of-line" was always indicated by and in-cell newlines were simply
The end-of-lines were all , except when within a quoted field. (e.g., val1,"first line of cell second line of cell",val3)

The files I am dealing with are too large to efficiently process in memory, so I would like to know if there's an alternative parser that doesn't automatically terminate a row after a carriage return or newline is encountered.

Are there any alternatives to the Python csv module that can quote newlines and use arbitrary characters as newlines for reading?

Answers (1)

Related Questions