Reputation: 7745
I am just getting started with AWS and have been playing around with EMR and CloudFormation. My goal is to write a Cloudformation template that will:
1. Create an EMR cluster with Spark and Hadoop installed
2. Run Spark jobs on the EMR cluster. Jobs will be submitted as a JAR or Pyspark files.
I have been able to successfully complete Step 1 but I am not sure how Step 2 is supposed to be done via CloudFormation.
I have been trying to look at a couple of examples on the AWS documentation and other sites but I could not see one where a spark job was being deployed via CloudFormation template.
Any examples or pointers in the right direction would be very helpful. Thanks in advance!
Upvotes: 4
Views: 2266
Reputation: 39
Change your EMR Cloudformation script like that parameters section of EMR
StepScriptFilePath:
Type: String
Description: Step Scipt to run a bash script or add a java file here
Default: 's3://s3-bucket/steps/step1.sh'
StepScriptFilePython:
Type: String
Description: Step Scipt to run a python file file
Default: 's3://s3-bucket/steps/step2.py'
StepJar:
Type: String
Description: Spark jar file
Default: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'
add this under EMR properties
Steps:
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- Ref: StepScriptFile
Jar:
Ref: StepJar
MainClass: ''
Name: run any bash or java job in spark
- ActionOnFailure: CONTINUE
HadoopJarStep:
Args:
- "spark-submit"
- Ref: StepScriptFilePython
Jar: command-runner.jar
Name: run a python script job
Upvotes: 4