Zhli
Zhli

Reputation: 380

How to track distributed tasks progress

Here is my case:

  1. When my server receieve a request, it will trigger distributed tasks, in my case many AWS lambda functions (the peek value could be 3000)
  2. I need to track each task progress / status i.e. pending, running, success, error
  3. My server could have many replicas
  4. I still want to know about the task progress / status even if any of my server replica down

My current design:

  1. I choose AWS S3 as my helper
  2. When a task start to execute, it will create marker file in a special folder on S3 e.g. running folder
  3. When the task fail or success, it will move the marker file from running folder to fail folder or success folder
  4. I check the marker files on S3 to check the progress of the tasks.

The problems:

  1. There is a limit for AWS S3 concurrent access
  2. My case is likely to exceed the limit some day

Attempt Solutions:

  1. I had tried my best to reduce the number of request to S3
  2. I don't want to track the progress by storing data in my DB because my DB has already been under heavy workload.

To be honest, it is kind of wierd that using marker files on S3 to track progress of the tasks. However, it worked before.

Is there any recommendations ?

Thanks in advance !

Upvotes: 2

Views: 1007

Answers (1)

Brad Irby
Brad Irby

Reputation: 2542

This sounds like a perfect application of persistent event queueing, specifically Kinesis. As each Lambda starts it generates a “starting” event on Kinesis. When it succeeds or fails, it generates the appropriate event. You could even create progress events along the way if you want to see how far they have gotten.

Your server can then monitor the number of starting events against ending events (success or failure) until these two numbers are equal. It can query the error events to see which processes failed and why. All servers can query the same events without disrupting each other, and any server can go down and recover without losing data.

Make sure to put an Origination Key on events that are supposed to be grouped together so they don't get mixed up with a subsequent event. Also, each Lambda should be given its own key so you can trace progress per Lambda. Guids are perfect for this.

Upvotes: 1

Related Questions