kjmerf
kjmerf

Reputation: 4335

ParserError with value containing delimiter

I have a CSV that looks like this:

"ID"|"Description"|"Date"
"1234"|"good text"|"2019-10-12"
"9012"|"bad"text|here""|"2018-01-12"

I am trying to write to a new CSV that looks like this:

+--------+------------------+--------------+
|  "ID"  |  "Description"   |    "Date"    |
+--------+------------------+--------------+
| "1234" | "good text"      | "2019-10-12" |
| "9012" | "bad text|here"  | "2018-01-12" |
+--------+------------------+--------------+

The issue is that the delimiter "|" is contained in one of the description values, namely "bad|text". So when I run something like:

df = pd.read_csv(csv_file, encoding='utf-16', sep='|')

I see:

pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

The only resolution I see is to skip the rows containing the delimiter but obviously I want them:

Python Pandas Error tokenizing data

Any suggestions?

Note that file is encoded as UTF-16.

Upvotes: 1

Views: 291

Answers (1)

Quant Christo
Quant Christo

Reputation: 1430

Here is solution, the key engine='python-fwf', regex seperator and some cleaning.

import pandas as pd


df = pd.read_csv('file_in.csv', engine='python-fwf', sep='"|"', quotechar='"')
df.to_csv('file_out.csv', sep='|', index=False)

Output:

print(df)
     ID Description                 Date
0  1234   good text           2019-10-12
1  9012   bad" text  here ""|"2018-01-12

file_out.csv:

ID|Description|Date
1234|good text|2019-10-12
9012|"bad"" text"|"here """"|""2018-01-12"

Upvotes: 1

Related Questions