Feihong Liu
Feihong Liu

Reputation: 1

pandas read csv with extra commas and quotations in column

I'm reading a basic csv file where the columns are separated by commas. However, the body column is a string which may contain commas and quotations. For example, there are some cells like "Bahamas\", The" and "Germany, West"

I have tried text = pd.read_table("input.txt", encoding = 'utf-16', quotechar='"', sep = ','), text = pd.read_table("input.txt", encoding = 'utf-16', quotechar='"', delimiter = ','). But they both cannot work.

Is there a way to go around this problem?

Upvotes: 0

Views: 492

Answers (1)

InTheEconomix
InTheEconomix

Reputation: 46

Are you able to regenerate the csv? If yes, change the delimit character to a pipe, I.e | . If not, you may be forced to take the long route... because there is no way for any code to figure out which characters are delimiting/quoting and which are part of the value if you have both commas and quotes lurking inside the value.

A workaround could involve leveraging the column position where this problem occurs... I.e first you could isolate the columns to the left of the troubled column, isolate all columns to the right, then all characters remaining are your troubled column. Can you post a few example rows? It would be good to see a few rows that have this issue, and a few that work fine

Upvotes: 1

Related Questions