Reputation: 417
I have a 3,000,000,000 line text file. I use this command below to open it
with open("/data/tmp/tbl_show_follow.txt") as infile:
but sometimes I need to kill my Python scripts to stop reading this file, and next time I need to read from the last position I read. My current solution is using a counter_i
to remember the position and print to the log every 100,000 lines
20161108 21:19 last position : 100000
20161108 22:34 last position : 200000
20161108 23:34 last position : 300000
.......
20161408 23:34 last position : 200000000
and I run python scripts again, I need to change condition like that
count_i = 0
with open("/data/tmp/tbl_show_follow.txt") as infile:
for line in infile:
if count_i > 300000:
do sth ...
but if my last position is 200,000,000 and I stop my Python script, next time I need to read file from the beginning and count 1 to 200,000,000. I think its very stupid to do that, how to begin from the 200,000,000th line? is there any method to remember the last position I read the file?
Upvotes: 0
Views: 2934
Reputation: 2220
You are here logging (or saving) the amount of lines read. The problem is that if you start reading a file, you don't know how long the lines are. For example consider a file that looks like this:
line1
line number two
line3
on your disk this file is saved as a continuous stream. like so (on unix):
line1\nline number two\nline3
Now, there is no way to know beforehand where line3 starts, because that depends on how long line1 and line3 are. And you can only know this once you read them, and find out where the \n
characters are.
So your simple solution would be to log/save the actual file position you are at. This is the tellf()
you see in the other answers. This is the current character you are at in your file. So you still don't know how many lines are before that, but you at least know that that is where you left of last time.
Upvotes: 0
Reputation: 149796
You can use file.tell()
to get the file’s current position (measured in bytes) and file.seek()
to set it.
Upvotes: 3