Gilles Cuyaubere
Gilles Cuyaubere

Reputation: 409

python pandas read_csv unable to read character double quoted twice

I'm trying to a 2 columns csv file (error.csv) with semi-column separator which contains double quoted semi-columns:

col1;col2
2016-04-17_22:34:25.126;"Linux; Android"
2016-04-17_22:34:25.260;"{"g":2}iPhone; iPhone"

And I'm trying:

logs = pd.read_csv('error.csv', na_values="null", sep=';', 
                   quotechar='"', quoting=0)

I understand that the problem comes from having a double quoted "g" inside my double quotes in line 3 but I can't figure out how to deal with it. Any ideas ?

Upvotes: 5

Views: 877

Answers (1)

ChrisP
ChrisP

Reputation: 5942

You will probably need to pre-process the data so that it conforms to the expected CSV format. I doubt pandas will handle this just by changing a parameter or two.

If there are only two columns, and the first never contains a semi-colon, then you could split the lines on the first semi-colon:

records = []
with open('error.csv', 'r') as fh:
    # first row is a header
    header = next(fh).strip().split(';')

    for rec in fh:
        # split only on the first semi-colon
        date, dat = rec.strip().split(';', maxsplit=1)
        # assemble records, removing quotes from the second column
        records.append((date, dat.strip('"')))

# create a data frame
df = pandas.DataFrame.from_records(records, columns=header)

You will have to manually parse the dates yourself with the datetime module if you want the first column to contain proper dates and not strings.

Upvotes: 1

Related Questions