ParserError with value containing delimiter

Question

I have a CSV that looks like this:

"ID"|"Description"|"Date"
"1234"|"good text"|"2019-10-12"
"9012"|"bad"text|here""|"2018-01-12"

I am trying to write to a new CSV that looks like this:

+--------+------------------+--------------+
|  "ID"  |  "Description"   |    "Date"    |
+--------+------------------+--------------+
| "1234" | "good text"      | "2019-10-12" |
| "9012" | "bad text|here"  | "2018-01-12" |
+--------+------------------+--------------+

The issue is that the delimiter "|" is contained in one of the description values, namely "bad|text". So when I run something like:

df = pd.read_csv(csv_file, encoding='utf-16', sep='|')

I see:

pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

The only resolution I see is to skip the rows containing the delimiter but obviously I want them:

Python Pandas Error tokenizing data

Any suggestions?

Note that file is encoded as UTF-16.

ParserError with value containing delimiter

Answers (1)

Related Questions