Reputation: 1829
I have code like this:
#opened file f
goto_line = num_lines #Total number of lines
while not found:
line_str = next(itertools.islice(f, goto_line - 1, goto_line))
goto_line = goto_line/2
#checks for data, sets found to True if needed
line_str is correct the first pass, but every pass after that is reading a different line then it should.
So for example, goto_line starts off as 1000. It reads line 1000 just fine. Then the next loop, goto_line is 500 but it doesn't read line 500. It reads some line closer to 1000.
I'm trying to read specific lines in a large file without reading more than necessary. Sometimes it jumps backwards to a line and sometimes forward.
I did try linecache, but I typically don't run this code more than once on the same file.
Upvotes: 2
Views: 5383
Reputation:
You cannot (this way - perhaps there is some way depending on how the file is opened) go back in the file. The standard file iterator (in fact, most iterators - Python's iterator protocol only supports forward iterators) moves only forward. So after reading k
lines, reading another k/2
lines actually gives the k+k/2
th line.
You could try reading the whole file into memory, but you have a lot of data so memory consumption propably becomes an issue. You could use file.seek
to scroll through the file. But that's still a lot of work - perhaps you could use a memory-mapped file? That's only possible if lines are fixed-size though. If it's necessary, you could pre-calculate the line numbers you'd like to check and save all those lines (shouldn't be too much, roughly int(log_2(line_count)) + 1
if I'm not mistaken) in one iteration so you don't have to scroll back after reading the whole file.
Upvotes: 0
Reputation: 601899
Python iterators can be consumed only once. This is easiest seen by example. The following code
from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
prints
[1, 2]
[4, 5]
[7, 8]
[]
The slicing always starts where we stopped last time.
The easiest way to make your code work is to use the f.readlines()
to get a list of the lines in the file and then use normal Python list slicing [i:j]
. If you really want to use islice()
, you could start reading the file from the beginning each time by using f.seek(0)
, but this will be very inefficient.
Upvotes: 6