Jean Celestin
Jean Celestin

Reputation: 75

Scheduling and handling a time consuming function to run at given date

I have a python function that should run a given date and time, which grab a video from an url, convert it to a given format and upload it to a storage.

This scheduled function relies on 3 other functions synchronously (which have Separation of Concerns)

Def getVideo(url):
    #1. download the video from an URL
    Return scraped_video 

Def convertVideo(scraped_video):
    #2. Convert the video to a given format
    Return file_ouput_path

Def sendVideo(file):
    #3. Upload the video to a given Gdrive or Dropbox


GrabAndConvertThenSend(url, notification_email):
    Try:
        Temp_video = getVideo(url)
        file_ouput_path = convertVideo(Temp_video)
        sendVideo(file_output_path)
        # send notification email with the link
        # Schedule the next run
    Except Exception as e:
        Print(e)

The function GrabAndConvertThenSend() is handled thru APScheduler to run at the given date

1. How do I implement retries ?

Sometimes, do to network issues or API availability, the main function could be aborted. Eg: the function has already downloaded the video, but failed to upload it to the storage.

How can I resume where it stopped without redownloading the video again ? I tought about storing the status (downloading, converting, uploading) in the database, is it the right way ?

2. Is it the right way to chain the function like so ? or should I rely on events/listeners , or even queuing jobs (when Task A finishes it queues the Task B), but how to implement this, as my functions have a separate of concerns

Upvotes: 0

Views: 291

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 149195

How do I implement retries ?

This is a general design concept. Here you could use recovery points. It consists of designing a workflow where the input from one step is only removed after that step has produced its full output. If the processing is later interrupted, you can restart the job after the last successful step. Because of the separation of concern, it should not be implemented in the current function but in a manager that would be responsable for:

  • restarting after the last successful step
  • call next step after one succeeds

In your pseudo code, this manager feature should be implemented in GrabAndConvertThenSend:

Def GrabAndConvertThenSend(url, notification_email):
    Try:
        if not is_a_raw_video_file_present():
            Temp_video = getVideo(url)
            save_to_(temp_path)
            rename(temp_path, raw_video_path)
            remove(raw_video_file)
        if not is_a_converted_video_file_present():
            file_ouput_temp_path = convertVideo(open(raw_video_path).read())
            rename(file_output_temp_path, file_output_path)
            remove(raw_video_path)
        if is_a_converted_video_file_present():
            sendVideo(file_output_path)
            remove(file_output_path)
        # send notification email with the link
        # Schedule the next run
    Except Exception as e:
        Print(e)
        # schedule with a shorter time to finish processing

Is it the right way to chain the function like so ?

This is a matter of taste. What matter are the features. Above pseudo-code implements recovery points and use that for retries. You could use other job management tools if they meet your requirements, or just use by hand implementation like this one

Upvotes: 1

Related Questions