Reputation: 95
So I have csv data like this:
1, 2, 3, bla bla bla, 4, 5;
"1, 2, 3, ""bla, bla, bla"", 4, 5";
"6, 7, 8, ""more, bla, bla"", 9, 10";
6, 7, 8, more bla bla, 9, 10;
In essence: a certain column has a string with the separator in it and this is in double-double quotes, while the whole line is in quotes too.
I've tried it with pandas:
df = pd.read_csv("data.csv", sep=',', skipinitialspace=True, quotechar='"', doublequote=True)
But because some of the lines are in quotes, it puts it into the first column:
column1 column12 column13 column14 column15 column16
1 2 3 bla bla bla 4 5
1,2,3,"bla, bla, bla", 4, 5 nan nan nan nan nan
6,7,8,"more, bla, bla",9,10 nan nan nan nan nan
6 7 8 more bla bla 9 10
How do I get get these quoted lines to act accordingly?
Upvotes: 3
Views: 75
Reputation: 46759
One approach would be to pre-process it before loading it into Pandas:
import csv
import pandas as pd
import io
data = []
with open('input.csv') as f_input:
for line in f_input:
line = line.strip('";\n').replace('""', '"')
row = next(csv.reader(io.StringIO(line, newline=''), skipinitialspace=True))
data.append(row)
df = pd.DataFrame(data)
print(df)
Giving:
0 1 2 3 4 5
0 1 2 3 bla bla bla 4 5
1 1 2 3 bla, bla, bla 4 5
2 6 7 8 more, bla, bla 9 10
3 6 7 8 more bla bla 9 10
Or you could write out the fixed version for later use:
with open('output.csv', 'w', newline='') as f_output:
csv.writer(f_output).writerows(data)
Upvotes: 2