Reputation: 117
I have a exported csv dataset which allows html text from users and I need to turn it into a DataFrame.
The columns with possible extra commas are quotted with "
, but the parser is using the commas inside them as separators.
This is the code I'm using, and I've already tried solutions from a github issue and another post here.
pd.read_csv(filePath,sep=',', quotechar='"', error_bad_lines=False)
results in
Here is the csv file itself, with the columns and first entry.
I don't know what the issue is, quotechar
was supposed to work, maybe the extra "
inside the quotted string?
Upvotes: 0
Views: 621
Reputation: 28
Here's the issue you're running into:
You set quote (") as your quotechar. Unfortunately, you also have quotes in your text:
<a href ="....">
And so... after that anchor tag, the next few commas are NOT considered inside quotes. Your best bet is probably to remake the original csv file with something else as quotechar (that doesn't appear at all in your text).
Upvotes: 1