Reputation: 83
I have a CSV file of dates and a float (day,month,year,float). here is sample,
1,1,2000,4076.79
2,1,2000,1216.82
3,1,2000,1299.68
4,1,2000,637.36
5,1,2000,3877.91
6,1,2000,3308.99
7,1,2000,2925.93
8,1,2000,1559.09
9,1,2000,3190.81
10,1,2000,3008.66
11,1,2000,2026.35
12,1,2000,3279.61
13,1,2000,3601.6
14,1,2000,2021.1
15,1,2000,2103.62
16,1,2000,609.64
17,1,2000,633.16
18,1,2000,1195.34
I want to read the first line then the last one:
handle = open(getInputFileName(), "r")
getInputFileName() obv. is a function that return file name. then,
print "numberlines", numberLines #DEBUG#
>>> 3660
numberLines is the number of lines in file. then,
handle.seek(0)
lineData = handle.readline().split(",")
print lineData #DEBUG#
>>> ['1','1','2000','4076.79\n']
until here everything works just fine. but then,
handle.seek(numberLines-1)
lineData = handle.readline().split(",")
print lineData #DEBUG#
>>>['7', '7', '2000', '2347.51\n']
but in fact the last line in file is 31,12,2009,3823.02
why isnt seek going all the way down?
i tried deleting the line that it gets stuck at but then the program crashed ValueError: could not convert string to float:
(I then use lineData as float):
newestDate.insert(1,float(lineData[1]))
I checked the file if there was a problem with lines but the format never changes. how come does my code work for the first line but not the last?
Upvotes: 0
Views: 623
Reputation: 51683
file.seek(offset[, whence]) operates on byte-positions inside the file. Not linenumbers. If you want to operate on lines, use readline() or iterate the file:
with ("file.txt", "r") as f:
first = next(f) # see comment Jean-François Fabre
for last in f: # and tdelanys comment :o)
pass # do nothing with all other lines, last will hold the last one
now first
and last
hold the first and last line respectively.
The advantage here is that you hold 1 line of text at max in memory and discard the rest. AFAIK there is no way to simply get 1st and last line of a file without stepping through it.
If you want to parse the data though, follow DyZ suggestion of using the csv module and a reader - its safer. If you feel adventourous - go for pandas , it has plenty of buildin csv capability :) and is able to read big csv's chunkwise to be more memory friendly (see f.e. How to read a 6 GB csv file with pandas )
Upvotes: 3
Reputation: 57105
Do not read CSV files by hand (your code fails if there is any quoted item with a comma in a row, like ...,"1,2000",...
). There is a CSV reader for that:
import csv
with open("foo.csv") as infile:
reader = csv.reader(infile)
data = list(reader)
data[0] # First
# ['1', '1', '2000', '4076.79']
data[-1] # Last
#['18', '1', '2000', '1195.34']
If memory is an issue, read the first line, skip the rest of the file, and retain the last line, as explained in the other answer.
Upvotes: 3