Python Dong
Python Dong

Reputation: 95

Python Reading a CSV with Double-Double Quoted elements and a Quoted line

So I have csv data like this:

1, 2, 3, bla bla bla, 4, 5;
"1, 2, 3, ""bla, bla, bla"", 4, 5";
"6, 7, 8, ""more, bla, bla"", 9, 10";
6, 7, 8, more bla bla, 9, 10;

In essence: a certain column has a string with the separator in it and this is in double-double quotes, while the whole line is in quotes too.

I've tried it with pandas:

df = pd.read_csv("data.csv", sep=',', skipinitialspace=True, quotechar='"', doublequote=True)

But because some of the lines are in quotes, it puts it into the first column:

column1                        column12    column13    column14    column15    column16
1                              2           3         bla bla bla   4           5
1,2,3,"bla, bla, bla", 4, 5    nan         nan         nan         nan         nan
6,7,8,"more, bla, bla",9,10    nan         nan         nan         nan         nan
6                              7           8         more bla bla  9           10

How do I get get these quoted lines to act accordingly?

Upvotes: 3

Views: 75

Answers (1)

Martin Evans
Martin Evans

Reputation: 46759

One approach would be to pre-process it before loading it into Pandas:

import csv
import pandas as pd
import io

data = []

with open('input.csv') as f_input:
    for line in f_input:
        line = line.strip('";\n').replace('""', '"')
        row = next(csv.reader(io.StringIO(line, newline=''), skipinitialspace=True))
        data.append(row)

df = pd.DataFrame(data)
print(df)

Giving:

   0  1  2               3  4   5
0  1  2  3     bla bla bla  4   5
1  1  2  3   bla, bla, bla  4   5
2  6  7  8  more, bla, bla  9  10
3  6  7  8    more bla bla  9  10

Or you could write out the fixed version for later use:

with open('output.csv', 'w', newline='') as f_output:
    csv.writer(f_output).writerows(data)

Upvotes: 2

Related Questions