Reputation: 13853
Sometimes CSV data is formatted like this
col1,col2,col3
a,b,"this field has an embedded quote character ("") in it"
Which is intended to be parsed as
col1 | col2 | col3
a | b | this field has an embedded quote character (") in it
That is, the field-quoting character is escaped by doubling it.
The Python csv.reader
module handles this just fine, as long as csv.Dialect.doublequote
is True
.
How can you do this in Pandas?
Upvotes: 2
Views: 3308
Reputation: 13853
Note: I found the answer before I had even finished posting
Use pd.read_csv(..., doublequote=True)
import csv
import pandas as pd
data = pd.read_csv('data.csv', quotechar='"', doublequote=True, quoting=csv.QUOTE_NONNUMERIC)
swapping out QUOTE_NONNUMERIC
for QUOTE_MINIMAL
, or something else, as appropriate.
Upvotes: 4