Reputation: 13
My problem is running a job after thousands of jobs finish running on AWS Batch. I have tried run the job in a job queue with lower priority and run the job in the same queue but submiting after all the others (the documentation says that the jobs are executed in approximately the order that they are submitted). But my question is if any one of these (or some other) guarantees that it will run after the others ?.
Upvotes: 1
Views: 837
Reputation: 5285
I wouldn't rely on a guarantee using the above methods. Execution order is explicitly not guaranteed to match submission order. Priority "should" work, but at large scale it's likely at some point something will delay your high priority execution and cause the scheduler to decide it has resources to spare for the low priority queue.
You can rely on job dependencies. They allow you to specify that one job depends on another N
jobs, and therefore must wait until they all finish to begin running. This can even be chained - A
depends on B
, B
depends on C
, guarantees order C -> B -> A
. Unfortunately, N <= 20
.
The best way to depend on a lot of jobs (more than 20) is to depend on a single array job, with all those jobs inside it. On a related note, an array job can also be configured to make all its jobs serially dependent (jobs execute in array order). The only caveat is you have to put all your jobs into an array. On the off-chance your thousands of jobs you want to depend on aren't already in an array, there are ways of manipulating them into one - for example, if you're processing 1000 files, you can put the files in a list, and have each array job index into the list using its job index.
Upvotes: 2