Rajesh
Rajesh

Reputation: 19

Parallel processing with load balancing on AWS

I have the below use case. Need some help in figuring out the best options on AWS.

  1. I have a python script which needs to be executed for 200 different datasets.
  2. I need to run each dataset in an AWS instance. Maximum instance I can have is 10 (so 20 times I need to ran on 10 instances parallelly to complete my 200 jobs)
  3. All the instances will use a common Mongo DB instance to store/read data for the python scripts.
  4. This is not an web application. Just a simple python script invocation.
  5. The python script won't provide any exit codes once its completed (3rd party script and don't have control over it). So I need to figure out the AWS instance completes the job so I can send the next dataset for process (kind of load balancing).

Upvotes: 0

Views: 265

Answers (1)

Andreas
Andreas

Reputation: 984

Sounds like a typical use case for SQS, a distributed queue.

  • Auto Scaling Group managing EC2 Instances
  • SQS queue managing calculation jobs
  • Small script polling new jobs from SQS and executing Python script
  • CloudWatch alarms scaling up and down Auto Scaling Group based on number of jobs in SQS queue

General approach: http://docs.aws.amazon.com/autoscaling/latest/userguide/as-using-sqs-queue.html

Using PaaS Elastic Beanstalk for this kind of setup: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html

Example implementation: https://cloudonaut.io/antivirus-for-s3-buckets/

Upvotes: 1

Related Questions