Cherry
Cherry

Reputation: 33608

How control parallel job runs count in AWS batch?

Aws batch supports up to 10000 job in one array. But what if each job writes to DynamoDb? It is needed to control rate in this situation. How to do that? Is there a setting to keep only N job in the running state and do not launch others?

Upvotes: 3

Views: 1848

Answers (1)

Derrops
Derrops

Reputation: 8137

Easiest way would be to send DyanmoDB jobs to an SQS queue, and have workers/lambdas poll this queue at a rate you specify. That is the classic approach to rate-limiting in AWS world. I would do some calculations as to what rate this should be in capacity units and configure your Tables' capacity accordingly with the queue polling rate.

Keep in mind that you may have other processes accessing your DynamoDB using up your Table's capacity as well as noting the retention time of the queue you setup. You may benefit immensely speed and cost wise with some caching implemented for read jobs, have a look at DAX for that.

Edit Just to address your comments. So as you say if you have 20 units for your table, you can only execute 10 jobs per second if each job is using 2 units in 1 second. Say you submit 10,000 jobs, at 10 jobs a second that will be 1,000 seconds to process all those jobs. If, however you submit more than 3,456,000 jobs, that will take more than 4 days to process at 10 jobs a second. The default retention time for SQS is 4 days, so you would start losing messages/jobs at this rate.

And as I mentioned you could have other processes accessing your table which could blow it's usage past 20 units, so you will need to be very careful when approaching your Table's limit.

Upvotes: 1

Related Questions