Naji Krayem
Naji Krayem

Reputation: 83

seek() not working properly, although file opened in "r" mode

I have a CSV file of dates and a float (day,month,year,float). here is sample,

1,1,2000,4076.79
2,1,2000,1216.82
3,1,2000,1299.68
4,1,2000,637.36
5,1,2000,3877.91
6,1,2000,3308.99
7,1,2000,2925.93
8,1,2000,1559.09
9,1,2000,3190.81
10,1,2000,3008.66
11,1,2000,2026.35
12,1,2000,3279.61
13,1,2000,3601.6
14,1,2000,2021.1
15,1,2000,2103.62
16,1,2000,609.64
17,1,2000,633.16
18,1,2000,1195.34

I want to read the first line then the last one:

handle = open(getInputFileName(), "r")

getInputFileName() obv. is a function that return file name. then,

print "numberlines", numberLines        #DEBUG# 
>>> 3660

numberLines is the number of lines in file. then,

handle.seek(0)
lineData = handle.readline().split(",")
print lineData      #DEBUG#
>>> ['1','1','2000','4076.79\n']

until here everything works just fine. but then,

handle.seek(numberLines-1)
lineData = handle.readline().split(",")
print lineData      #DEBUG#
>>>['7', '7', '2000', '2347.51\n']

but in fact the last line in file is 31,12,2009,3823.02 why isnt seek going all the way down? i tried deleting the line that it gets stuck at but then the program crashed ValueError: could not convert string to float: (I then use lineData as float):

newestDate.insert(1,float(lineData[1]))

I checked the file if there was a problem with lines but the format never changes. how come does my code work for the first line but not the last?

Upvotes: 0

Views: 623

Answers (2)

Patrick Artner
Patrick Artner

Reputation: 51683

file.seek(offset[, whence]) operates on byte-positions inside the file. Not linenumbers. If you want to operate on lines, use readline() or iterate the file:

with ("file.txt", "r") as f:
    first = next(f) # see comment Jean-François Fabre
    for last in f:  # and tdelanys comment :o)
        pass # do nothing with all other lines, last will hold the last one  

now first and last hold the first and last line respectively.

The advantage here is that you hold 1 line of text at max in memory and discard the rest. AFAIK there is no way to simply get 1st and last line of a file without stepping through it.

If you want to parse the data though, follow DyZ suggestion of using the csv module and a reader - its safer. If you feel adventourous - go for pandas , it has plenty of buildin csv capability :) and is able to read big csv's chunkwise to be more memory friendly (see f.e. How to read a 6 GB csv file with pandas )

Upvotes: 3

DYZ
DYZ

Reputation: 57105

Do not read CSV files by hand (your code fails if there is any quoted item with a comma in a row, like ...,"1,2000",...). There is a CSV reader for that:

import csv
with open("foo.csv") as infile:
    reader = csv.reader(infile)
    data = list(reader)

data[0] # First
# ['1', '1', '2000', '4076.79']
data[-1] # Last
#['18', '1', '2000', '1195.34']

If memory is an issue, read the first line, skip the rest of the file, and retain the last line, as explained in the other answer.

Upvotes: 3

Related Questions