Rubin Porwal
Rubin Porwal

Reputation: 3845

How to Install s3cmd in Amazon EMR instance

I am building an Apache Spark application which be executed in an EMR instance .For that I am creating a cluster and after that I am adding steps into cluster for execution of Spark application.

In Spark Application I need to perform read/write operations into S3 . For interaction with S3 services I need to install s3cmd in EMR instance. Also while creating EMR cluster I need to install and configure s3cmd using --bootstrap-application

But I need details regarding how to install and configure s3cmd using bootstrap-application

Please provide me appropriate information regarding it

Upvotes: 2

Views: 731

Answers (2)

Stephen
Stephen

Reputation: 3465

Use custom bootstrap action.

"Bootstrap actions are scripts that are run on the cluster nodes when Amazon EMR launches the cluster. They run before Hadoop starts and before the node begins processing data."

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html#bootstrapCustom

Upvotes: 0

Matt Domsch
Matt Domsch

Reputation: 486

https://dbaumgartel.wordpress.com/2014/04/10/an-elastic-mapreduce-streaming-example-with-python-and-ngrams-on-aws/

gives an example of pushing python scripts to S3 (using s3cmd on your local computer) which are used in the EMR application. You then push your source data to S3, and the EMR application puts is results into S3. You can use s3cmd on your local computer to push the source data and download the results.

If your source data is already in S3 or elsewhere in AWS, you can always create a new EC2 instance in which to run s3cmd to get the data into the right S3 bucket for processing.

Upvotes: 2

Related Questions