Reputation: 7597
I am having a .csv
file like this-
A|B|C|D
1|"|1|"1
2|2|2|2
3|"|3|3
4|4|4|4
5|"five"|5|5
6|6|6|6
I import this .csv
file as -
data=pd.read_csv('C:\Python Work\inverted_commas.csv',sep='|',dtype={'B':object,'C':object,'D':object},names=['A','B','C','D'],skiprows=1)
The result looks like this:
A B C D
1 |1|1 NaN NaN
2 2 2 2
3 |3|3\r\n4|4|4|4\r\n5|five" 5 5
6 6 6 6
|
is a separator for this .csv
file. As we had unmatched double quotes "
in line 1 and 3 (top line is header), the entire import went wrong. I wanted the result like -
A B C D
1 " 1 "1
2 2 2 2
3 " 3 3
4 4 4 4
5 "five" 5 5
6 6 6 6
SAS
interestingly imports this file correctly in this way. Is there a way or an option in .read_csv
where we could specify that the moment one encounters a delimiter, irrespective of whether the beginning quote is matched with the closing quote or not, the column should be filled in with that value, as shown above?
Upvotes: 1
Views: 1319
Reputation: 862921
You can use parameter quoting=3
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""A|B|C|D
1|"|1|"1
2|2|2|2
3|"|3|3
4|4|4|4
5|"five"|5|5
6|6|6|6"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="|", quoting=3)
print (df)
A B C D
0 1 " 1 "1
1 2 2 2 2
2 3 " 3 3
3 4 4 4 4
4 5 "five" 5 5
5 6 6 6 6
Upvotes: 1