Amjasd Masdhash
Amjasd Masdhash

Reputation: 178

Continuing where I left off Python

I have a bunch of list of links which I'm doing a specific function on each link, the function takes about 25 sec, I use selenium to open each and get the page source of it then do my function, however whenever I build the program and cancel the build, I will have to start all over again.

Note:I get links from different webs sitemap.

Is there a way to save my progress and continue it later on?

Upvotes: 0

Views: 1332

Answers (3)

JeffC
JeffC

Reputation: 25644

I would suggest that you write out the links to a file along with a date/time stamp of the last time it was processed. When you write links to the file, you will want to make sure that you don't write the same link twice. You will also want to date/time stamp a link after you are done processing it.

Once you have this list, when the script is started you read the entire list and start processing links that haven't been processed in X days (or whatever your criteria is).

Steps:

  1. Load links file
  2. Scrape links from sitemap, compare to existing links from file, write any new links to file
  3. Find the first link that hasn't been processed in X days
  4. Process that link then write date/time stamp next to link, e.g.

    http://www.google.com,1/25/2019 12:00PM
    
  5. Go back to Step 3

Now any time you kill the run, the process will pick up where you left off.

NOTE: Just writing out the date may be enough. It just depends on how often you want to refresh your list (hourly, etc.) or if you want that much detail.

Upvotes: 1

Gaurang Shah
Gaurang Shah

Reputation: 12930

this code will work. I assume you already have a function got getting links. I have just used a dummy one _get_links. You will have to delete the content of links file and need to put 0 in index file after every successful run.

import time

def _get_links():
    return ["a", "b", "c"]

def _get_links_from_file():
    with open("links") as file:
        return file.read().split(",")


def _do_something(link):
    print(link)
    time.sleep(30)

def _save_links_to_file(links):
    with open("links", "w") as file:
        file.write(",".join(links))
    print("links saved")

def _save_index_to_file(index):
    with open("index", "w") as file:
        file.write(str(index))
    print("index saved")

def _get_index_from_file():
    with open("index",) as file:
        return int(file.read().strip())


def process_links():
    links=_get_links_from_file()
    if len(links) == 0:
        links = _get_links()
        _save_links_to_file(links)
    else:
        links = _get_links_from_file()[_get_index_from_file():]


    for index, link in enumerate(links):
        _do_something(link)
        _save_index_to_file(index+1)

if __name__ == '__main__':
    process_links()

Upvotes: 2

HAKS
HAKS

Reputation: 429

You should save the links in a text file. You should also save the index numbers in another text file, probably initializing with 0.

In your code, you can then loop through the links using something like:

for link in links[index_number:]

At the end of every loop, add the index number to the text file holding the index numbers. This would help you continue from where you left off.

Upvotes: -1

Related Questions