Reputation: 29
I am new to Python and Pandas library. I am trying to read a csv file using pandas in windows 10 and I am getting the above mentioned error. What's strange is that the same code is running without any error, in some other PCs. without sep=";" 1)I have tried including sep=";" and the output is not what I am expecting(data read into a series instead of a dataframe). using sep=";" 2)I am getting a partial output when used attribute nrows="5" but getting same parser error when used nrows="6"(It is clear that the program is facing some difficulty at line 6).
I am including the snapshot of the 1st few lines of the data set for reference. 1st 30 rows of dataset
Upvotes: 0
Views: 4392
Reputation: 1039
Explanation
The problem is that when you read the CSV using pd.read_csv
, it uses the first line of the file as headers. Your file has 4 such values:
citrus-fruit, semi-finished fruit, margarine, ready soup
It now assumes that every row has at most 4 comma-separated values, but if it has less, it treats the missing values as blank. When it tries to parse line 6, i.e.
whole milk, butter, yogurt, rice, abrasive cleaner
It sees one extra value (abrasive cleaner
) and throws an error.
When you use ;
as a separator, it reads the entire line without encountering a ;
and so the entire dataframe has a single column, which is the line as a string.
Solution
It depends on what you're trying to achieve. If you really do want to read this in as a CSV, you can:
item1, item2, item3, item4 ........ item11
names
argument to read_csv
, (again assuming you have 11 columns) like so:pd.read_csv(filename, names=['item' + str(i) for i in range(11)])
Upvotes: 1