cph_sto
cph_sto

Reputation: 7597

How to deal with Unmatched Quotes in .csv file while importing with read_csv(..) in pandas python

I am having a .csv file like this-

A|B|C|D 
1|"|1|"1
2|2|2|2 
3|"|3|3 
4|4|4|4
5|"five"|5|5
6|6|6|6

I import this .csv file as -

data=pd.read_csv('C:\Python Work\inverted_commas.csv',sep='|',dtype={'B':object,'C':object,'D':object},names=['A','B','C','D'],skiprows=1)

The result looks like this:

A                           B    C    D
1                        |1|1  NaN  NaN
2                           2    2    2
3  |3|3\r\n4|4|4|4\r\n5|five"    5    5
6                           6    6    6 

| is a separator for this .csv file. As we had unmatched double quotes " in line 1 and 3 (top line is header), the entire import went wrong. I wanted the result like -

A         B    C    D
1         "    1   "1
2         2    2    2
3         "    3    3
4         4    4    4
5    "five"    5    5
6         6    6    6

SAS interestingly imports this file correctly in this way. Is there a way or an option in .read_csv where we could specify that the moment one encounters a delimiter, irrespective of whether the beginning quote is matched with the closing quote or not, the column should be filled in with that value, as shown above?

Upvotes: 1

Views: 1319

Answers (1)

jezrael
jezrael

Reputation: 862921

You can use parameter quoting=3:

import pandas as pd
from pandas.compat import StringIO

temp=u"""A|B|C|D 
1|"|1|"1
2|2|2|2 
3|"|3|3 
4|4|4|4
5|"five"|5|5
6|6|6|6"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="|", quoting=3)

print (df)
   A       B  C  D 
0  1       "  1  "1
1  2       2  2  2 
2  3       "  3  3 
3  4       4  4   4
4  5  "five"  5   5
5  6       6  6   6

Upvotes: 1

Related Questions