shadowtalker
shadowtalker

Reputation: 13853

Pandas and "doubled double quote" escaping in CSV

Sometimes CSV data is formatted like this

col1,col2,col3
a,b,"this field has an embedded quote character ("") in it"

Which is intended to be parsed as

col1 | col2 | col3
a    | b    | this field has an embedded quote character (") in it

That is, the field-quoting character is escaped by doubling it.

The Python csv.reader module handles this just fine, as long as csv.Dialect.doublequote is True.

How can you do this in Pandas?

Upvotes: 2

Views: 3308

Answers (1)

shadowtalker
shadowtalker

Reputation: 13853

Note: I found the answer before I had even finished posting

Use pd.read_csv(..., doublequote=True)

import csv
import pandas as pd

data = pd.read_csv('data.csv', quotechar='"', doublequote=True, quoting=csv.QUOTE_NONNUMERIC)

swapping out QUOTE_NONNUMERIC for QUOTE_MINIMAL, or something else, as appropriate.

Upvotes: 4

Related Questions