brodieR
brodieR

Reputation: 69

Python For loop reads only half of file

So I am attempting to iterate through a .csv file and do some calculations based off of it, my problem being that the file is 10001 lines long and when my program executes it only seems to read 5001 of those lines. Am I doing something wrong when reading in my data or is there a memory limit or some sort of other limitation I am running into? The calculations are fine but they are off from the expected results in some instances and thus I am lead to believe that the missing half of the data will solve this.

fileName = 'normal.csv' #input("Enter a file name: ").strip()
file = open(fileName, 'r') #open the file for reading
header = file.readline().strip().split(',') #Get the header line
data = [] #Initialise the dataset
for index in range(len(header)):
    data.append([])
for yy in file:
    ln = file.readline().strip().split(',') #Store the line
    for xx in range(len(data)):
        data[xx].append(float(ln[xx]))

And here is some sample output, yet to be completley formatted but it will be eventually:

"""The file normal.csv contains 3 columns and 5000 records.
         Column Heading   |        Mean        |     Std. Dev.      
      --------------------+--------------------+--------------------
      Width [mm]|999.9797|2.5273
      Height [mm]|499.9662|1.6889
      Thickness [mm]|12.0000|0.1869"""

As this is homework I would ask that you attempt to keep responses helpful but not outright the solution, thank you.

Upvotes: 0

Views: 2176

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121534

That's because you are asking Python to read lines in two different locations:

for yy in file:

and

ln = file.readline().strip().split(',') #Store the line

yy is already a line from the file, but you ignored it; iteration over a file object yields lines from the file. You then read another line using file.readline().

If you use iteration, don't use readline() as well, just use yy:

for yy in file:
    ln = yy.strip().split(',') #Store the line

You are re-inventing the CSV-reading wheel, however. Just use the csv module instead.

You can read all data in a CSV file into a list per column with some zip() function trickery:

import csv

with open(fileName, 'r', newline='') as csvfile:
    reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)  # convert to float
    header = next(reader, None)   # read one row, the header, or None
    data = list(zip(*reader))  # transpose rows to columns

Upvotes: 2

Related Questions