Wrong row count for CSV file in python

Question

I am processing a csv file and before that I am getting the row count using the below code.

total_rows=sum(1 for row in open(csv_file,"r",encoding="utf-8"))

The code has been written with the help given in this link. However, the total_rows doesn't match the actual number of rows in the csv file. I have found an alternative to do it but would like to know why is this not working correctly??

In the CSV file, there are cells with huge text and I have to use the encoding to avoid errors reading the csv file.

Any help is appreciated!

Chris · Accepted Answer

Let's assume you have a csv file in which some cell's a multi-line text.

$ cat example.csv
colA,colB
1,"Hi. This is Line 1.
And this is Line2"

Which, by look of it, has three lines and wc -l agrees:

$ wc -l example.csv
3 example.csv

And so does open with sum:

sum(1 for row in open('./example.csv',"r",encoding="utf-8"))
# 3

But now if you read is with some csv parser such as pandas.read_csv:

import pandas as pd

df = pd.read_csv('./example.csv')
df
   colA                                    colB
0     1  Hi. This is Line 1.
And this is Line2

The other alternative way to fetch the correct number of rows is given below:

with open(csv_file,"r",encoding="utf-8") as f:
     reader = csv.reader(f,delimiter = ",")
     data = list(reader)
     row_count = len(data)

Excluding the header, the csv contains 1 line, which I believe is what you expect. This is because colB's first cell (a.k.a. huge text block) is now properly handled with the quotes wrapping the entire text.

Wrong row count for CSV file in python

Answers (2)

Related Questions