Zeno
Zeno

Reputation: 1829

Python: itertools.islice not working in a loop

I have code like this:

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str is correct the first pass, but every pass after that is reading a different line then it should.

So for example, goto_line starts off as 1000. It reads line 1000 just fine. Then the next loop, goto_line is 500 but it doesn't read line 500. It reads some line closer to 1000.

I'm trying to read specific lines in a large file without reading more than necessary. Sometimes it jumps backwards to a line and sometimes forward.

I did try linecache, but I typically don't run this code more than once on the same file.

Upvotes: 2

Views: 5383

Answers (2)

user395760
user395760

Reputation:

You cannot (this way - perhaps there is some way depending on how the file is opened) go back in the file. The standard file iterator (in fact, most iterators - Python's iterator protocol only supports forward iterators) moves only forward. So after reading k lines, reading another k/2 lines actually gives the k+k/2th line.

You could try reading the whole file into memory, but you have a lot of data so memory consumption propably becomes an issue. You could use file.seek to scroll through the file. But that's still a lot of work - perhaps you could use a memory-mapped file? That's only possible if lines are fixed-size though. If it's necessary, you could pre-calculate the line numbers you'd like to check and save all those lines (shouldn't be too much, roughly int(log_2(line_count)) + 1 if I'm not mistaken) in one iteration so you don't have to scroll back after reading the whole file.

Upvotes: 0

Sven Marnach
Sven Marnach

Reputation: 601899

Python iterators can be consumed only once. This is easiest seen by example. The following code

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

prints

[1, 2]
[4, 5]
[7, 8]
[]

The slicing always starts where we stopped last time.

The easiest way to make your code work is to use the f.readlines() to get a list of the lines in the file and then use normal Python list slicing [i:j]. If you really want to use islice(), you could start reading the file from the beginning each time by using f.seek(0), but this will be very inefficient.

Upvotes: 6

Related Questions