Reputation: 178
I have a bunch of list of links which I'm doing a specific function on each link, the function takes about 25 sec, I use selenium to open each and get the page source of it then do my function, however whenever I build the program and cancel the build, I will have to start all over again.
Note:I get links
from different webs sitemap.
Is there a way to save my progress and continue it later on?
Upvotes: 0
Views: 1332
Reputation: 25644
I would suggest that you write out the links to a file along with a date/time stamp of the last time it was processed. When you write links to the file, you will want to make sure that you don't write the same link twice. You will also want to date/time stamp a link after you are done processing it.
Once you have this list, when the script is started you read the entire list and start processing links that haven't been processed in X days (or whatever your criteria is).
Steps:
Process that link then write date/time stamp next to link, e.g.
http://www.google.com,1/25/2019 12:00PM
Now any time you kill the run, the process will pick up where you left off.
NOTE: Just writing out the date may be enough. It just depends on how often you want to refresh your list (hourly, etc.) or if you want that much detail.
Upvotes: 1
Reputation: 12930
this code will work. I assume you already have a function got getting links. I have just used a dummy one _get_links
.
You will have to delete the content of links
file and need to put 0
in index
file after every successful run.
import time
def _get_links():
return ["a", "b", "c"]
def _get_links_from_file():
with open("links") as file:
return file.read().split(",")
def _do_something(link):
print(link)
time.sleep(30)
def _save_links_to_file(links):
with open("links", "w") as file:
file.write(",".join(links))
print("links saved")
def _save_index_to_file(index):
with open("index", "w") as file:
file.write(str(index))
print("index saved")
def _get_index_from_file():
with open("index",) as file:
return int(file.read().strip())
def process_links():
links=_get_links_from_file()
if len(links) == 0:
links = _get_links()
_save_links_to_file(links)
else:
links = _get_links_from_file()[_get_index_from_file():]
for index, link in enumerate(links):
_do_something(link)
_save_index_to_file(index+1)
if __name__ == '__main__':
process_links()
Upvotes: 2
Reputation: 429
You should save the links in a text file. You should also save the index numbers in another text file, probably initializing with 0.
In your code, you can then loop through the links using something like:
for link in links[index_number:]
At the end of every loop, add the index number to the text file holding the index numbers. This would help you continue from where you left off.
Upvotes: -1