Reputation: 1754
I have searched some methods how to read csv files where values contain comma , but I have not ever seen how to read it only by pandas successfully.
For example, the csv file contains "A", "B", "C", "D", "E", "F" columns where only "C" column values contain comma.
The type of C column values is string
I have tried this:
pd.read_csv('my.csv',quotechar="'")
but it returns
CParserError: Error tokenizing data. C error: Expected 6 fields in line 1553, saw 7
Update:
Some values in C column started with comma like ",hello" while some commas among the values like "hello,hello,hello"
How can I set the parameters quotechar
to solve my problems ?
Upvotes: 1
Views: 5763
Reputation: 163
I had that kind of problems while trying to parse with pandas a CSV file containing SQL queries, thus involving commas inside some columns.
To solve that problem, we had to use another separator than a comma for our columns, and set the 'sep' attribute from pandas.read_csv accordingly, like that :
df = pd.read_csv(path, sep=';')
Personnaly, since I'm lazy, if I were you I'll just change (or ask to change) the delimiter from comma to something else (like semicolon) in the CSV you have as an input.
But if you can't, here's something I found while looking for a solution :
Pandas Read CSV with string delimiters via regex
As you can see inside that code, a regex was used, and allowed the user to parse its csv file while delimiters were not clearly defined for pandas, by stating in the regex which value to extract and how to do it.
I'm no expert in regex, but it might fit your needs.
Upvotes: 1