Reading csv with pandas - dealing with imbalanced rows

Question

I have more than 1 million rows, and there is a very long text field making some of my rows imbalanced. This causes some rows to have more columns than my header. I fixed this with following:

read_csv('filename.csv', error_bad_lines=False)

The problem here is it appears there are some rows witch less columns then my header. This is a problem (some fields shift.)

How can I fix this? Is there a way that (I blame that long text field) to act as a one field?

edit after comment

Field delimiter is comma. When I run df.dtypes all fields but one seems to be object, however I originally have int, and datetime fields, read as objects by pandas.

edit after comment 2

here is header for what I have in .csv id(int),textField(string),id2(char),score(int),type(string),length(int),name(string),datetime(datetime),size(int),email(string)

The main problem is textField area. the others cannot have and foul characers for escaping csv syntax. However textField is created by users, it can be anything in unicode; emojis, non english chars funny quote etc.

Reading csv with pandas - dealing with imbalanced rows

Answers (1)

Related Questions