Reputation: 1066
I'm working on a small project to get myself acquainted with the Amazon web services. I'm trying to make a simple web application; when a button is pressed a mapreduce job is launched and the output is returned on the browser. What would be the best way to do this? Also, is there a way to launch an amazon elastic mapreduce job via the command line?
Upvotes: 0
Views: 431
Reputation: 8722
You can use the AWS SDK in whatever language you're writing your web application in to make calls to EMR to submit a job. I work mostly with python so I'm most familiar with the Python Boto library which makes it pretty painless to upload code and data to s3, configure a jobflow and launch that job flow.
You won't want to launch the job and return the results in the same HTTP request as it will take several minutes just to start the cluster before the job will be able to run. A web application with pages that don't respond for minutes isn't a good user experience. However, just submitting a jobflow seems to only take a few seconds. You'll need to create the job flow and just keep track of the jobflow ids in your web application. Given a jobflow ID you shouldn't have too much trouble retrieving log data or output from the jobflow when the user comes back and the job is complete.
Here's an example of how one could launch an Elastic MR job with Boto:
import boto
from boto.emr.step import StreamingStep
conn = boto.connect_emr()
step = StreamingStep(name='My wordcount example',
mapper='s3n://elasticmapreduce/samples/wordcount/wordSplitter.py',
reducer='aggregate',
input='s3n://elasticmapreduce/samples/wordcount/input',
output='s3n://<my output bucket>/output/wordcount_output')
jobid = conn.run_jobflow(name='My jobflow',
log_uri='s3://<my log uri>/jobflow_logs',
steps=[step])
Upvotes: 2
Reputation: 184
Did you give this a look yet? http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 It's from the dev side and might help you along.
Upvotes: 0