Reputation: 33
I am using AWS EC2 instances for bioinformatics work. I have a number (~1000) of large files that should be processed with a script on EC2 instances and results should be uploaded back to an S3 bucket. I want to distribute jobs (files) to a number of EC2 instances, preferentially started on spot prices.
What I need is a simple to use queuing system (possibly AWS SQS or something else) that can distribute jobs to instances and restart jobs if an instance fails (due to too high spot prices or other reasons). I have studied AWS SQS examples but these are too advanced and typically involve auto-scaling and complicated message generating applications.
Can someone point out conceptually how to solve this best and in the easiest way? Any examples of this simple application of AWS SQS? How should a bunch of instances be started and how to tell then to listen to the queue?
My workflow is basically like this for each input file:
aws s3 cp s3://mybucket/file localFile ## Possibly streaming the file without copy
work.py --input localFile --output outputFile
aws s3 cp outputFile s3://mybucket/output/outputFile
Upvotes: 2
Views: 5240
Reputation: 730
As of ~December 2016 AWS has launched a service called AWS Batch which may be a good (perhaps even great) fit for the workload described in the question. Please review Batch before selecting one of the other suggestions.
Upvotes: 5
Reputation: 270124
You are describing a very common batch-oriented design pattern:
The best way to accomplish this is:
Rather than having the queueing system "distribute jobs to instances and restart jobs if an instance fails", SQS would merely be used to store the jobs. Auto-Scaling would be responsible for launching instances (including restarts are Spot price changes) and the instances themselves would pull the work from a queue. Think of it as a "pull" model rather than a "push" model.
While the total system might seem complex, each individual component is quite simple. I would recommend doing it one step at a time:
aws sqs put-message
from the CLI, or adding a few lines of code in Python using Boto (the AWS SDK for Python).Here's some sample code (call it with a message on the command-line):
#!/usr/bin/python27
import boto, boto.sqs
from boto.sqs.message import Message
from optparse import OptionParser
# Parse command line
parser = OptionParser()
(options, args) = parser.parse_args()
# Send to SQS
q_conn = boto.sqs.connect_to_region('ap-southeast-2')
q = q_conn.get_queue('my-queue')
m = Message()
m.set_body(args[0])
print q.write(m)
print args[0] + ' pushed to Queue'
User Data
field to trigger work when the instance starts. Either run your workflow from a shell script, or you can write the S3 upload/download code as part of your Python app too (including the loop to keep pulling new messages).Here's some code to retrieve a message from a queue:
#!/usr/bin/python27
import boto, boto.sqs
from boto.sqs.message import Message
# Connect to Queue
q_conn = boto.sqs.connect_to_region('ap-southeast-2')
q = q_conn.get_queue('my-queue')
# Get a message
m = q.read(visibility_timeout=15)
if m == None:
print "No message!"
else:
print m.get_body()
q.delete_message(m)
See also:
Upvotes: 11