ragesz
ragesz

Reputation: 9527

python pandas read_csv quotechar does not work

I've read this, this and this posts but despite I don't know why quotechar does not work at pd.read_csv() (Python 3, pandas 0.18.0 and 0.18.1). And how could I read a dataframe like this:

"column1","column2", "column3", "column4", "column5", "column6"
"AM", 7, "1", "SD", "SD", "CR"
"AM", 8, "1,2 ,3", "PR, SD,SD", "PR ; , SD,SD", "PR , ,, SD ,SD"
"AM", 1, "2", "SD", "SD", "SD"

I want the following result:

Out[116]: 
  column1  column2 column3    column4       column5        column6
0      AM        7       1         SD            SD             CR
1      AM        8  1,2 ,3  PR, SD,SD  PR ; , SD,SD  PR , ,, SD,SD
2      AM        1       2         SD            SD             SD

Thank you!!

Upvotes: 12

Views: 31632

Answers (2)

yoonghm
yoonghm

Reputation: 4625

Another solution is to use a proper regular expression instead of the simple \s+. We need to find comma (,) which is not within quotation marks:

pd.read_csv(file, 
            sep=', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))',
            engine='python')

The expression is taken from here.

Upvotes: 0

ptrj
ptrj

Reputation: 5212

Pandas doc on separators in read_csv():

Separators longer than 1 character and different from '\s+' will be interpreted as regular expressions, will force use of the python parsing engine and will ignore quotes in the data.

Try using this instead (sep by default set to a comma):

pd.read_csv(file, skipinitialspace = True, quotechar = '"')

Upvotes: 23

Related Questions