LawrenceLi
LawrenceLi

Reputation: 417

How to read a file from the last read position?

I have a 3,000,000,000 line text file. I use this command below to open it

with open("/data/tmp/tbl_show_follow.txt") as infile:

but sometimes I need to kill my Python scripts to stop reading this file, and next time I need to read from the last position I read. My current solution is using a counter_i to remember the position and print to the log every 100,000 lines

20161108 21:19  last position : 100000
20161108 22:34  last position : 200000
20161108 23:34  last position : 300000
.......
20161408 23:34  last position : 200000000

and I run python scripts again, I need to change condition like that

count_i = 0 
with open("/data/tmp/tbl_show_follow.txt") as infile:
    for line in infile:
        if count_i > 300000:
            do sth ...

but if my last position is 200,000,000 and I stop my Python script, next time I need to read file from the beginning and count 1 to 200,000,000. I think its very stupid to do that, how to begin from the 200,000,000th line? is there any method to remember the last position I read the file?

Upvotes: 0

Views: 2934

Answers (2)

David van rijn
David van rijn

Reputation: 2220

You are here logging (or saving) the amount of lines read. The problem is that if you start reading a file, you don't know how long the lines are. For example consider a file that looks like this:

line1
line number two
line3

on your disk this file is saved as a continuous stream. like so (on unix):

line1\nline number two\nline3

Now, there is no way to know beforehand where line3 starts, because that depends on how long line1 and line3 are. And you can only know this once you read them, and find out where the \n characters are.

So your simple solution would be to log/save the actual file position you are at. This is the tellf() you see in the other answers. This is the current character you are at in your file. So you still don't know how many lines are before that, but you at least know that that is where you left of last time.

Upvotes: 0

Eugene Yarmash
Eugene Yarmash

Reputation: 149796

You can use file.tell() to get the file’s current position (measured in bytes) and file.seek() to set it.

Upvotes: 3

Related Questions