Reputation: 3845
I am building an Apache Spark application which be executed in an EMR instance .For that I am creating a cluster and after that I am adding steps into cluster for execution of Spark application.
In Spark Application I need to perform read/write operations into S3 . For interaction with S3 services I need to install s3cmd in EMR instance. Also while creating EMR cluster I need to install and configure s3cmd using --bootstrap-application
But I need details regarding how to install and configure s3cmd using bootstrap-application
Please provide me appropriate information regarding it
Upvotes: 2
Views: 731
Reputation: 3465
Use custom bootstrap action.
"Bootstrap actions are scripts that are run on the cluster nodes when Amazon EMR launches the cluster. They run before Hadoop starts and before the node begins processing data."
Upvotes: 0
Reputation: 486
gives an example of pushing python scripts to S3 (using s3cmd on your local computer) which are used in the EMR application. You then push your source data to S3, and the EMR application puts is results into S3. You can use s3cmd on your local computer to push the source data and download the results.
If your source data is already in S3 or elsewhere in AWS, you can always create a new EC2 instance in which to run s3cmd to get the data into the right S3 bucket for processing.
Upvotes: 2