bhargav reddy
bhargav reddy

Reputation: 29

ParserError: Error tokenizing data. C error: Expected 4 fields in line 6, saw 5

I am new to Python and Pandas library. I am trying to read a csv file using pandas in windows 10 and I am getting the above mentioned error. What's strange is that the same code is running without any error, in some other PCs. without sep=";" 1)I have tried including sep=";" and the output is not what I am expecting(data read into a series instead of a dataframe). using sep=";" 2)I am getting a partial output when used attribute nrows="5" but getting same parser error when used nrows="6"(It is clear that the program is facing some difficulty at line 6).

I am including the snapshot of the 1st few lines of the data set for reference. 1st 30 rows of dataset

Upvotes: 0

Views: 4392

Answers (1)

havanagrawal
havanagrawal

Reputation: 1039

Explanation

The problem is that when you read the CSV using pd.read_csv, it uses the first line of the file as headers. Your file has 4 such values:

citrus-fruit, semi-finished fruit, margarine, ready soup

It now assumes that every row has at most 4 comma-separated values, but if it has less, it treats the missing values as blank. When it tries to parse line 6, i.e.

whole milk, butter, yogurt, rice, abrasive cleaner

It sees one extra value (abrasive cleaner) and throws an error.

When you use ; as a separator, it reads the entire line without encountering a ; and so the entire dataframe has a single column, which is the line as a string.

Solution

It depends on what you're trying to achieve. If you really do want to read this in as a CSV, you can:

  1. Add a header row in your CSV file, like so (assuming you have 11 columns):
item1, item2, item3, item4 ........ item11
  1. Use the names argument to read_csv, (again assuming you have 11 columns) like so:
pd.read_csv(filename, names=['item' + str(i) for i in range(11)])

Upvotes: 1

Related Questions