activelearner
activelearner

Reputation: 7745

How to run a Spark job on EMR via Cloudformation

I am just getting started with AWS and have been playing around with EMR and CloudFormation. My goal is to write a Cloudformation template that will:

1. Create an EMR cluster with Spark and Hadoop installed
2. Run Spark jobs on the EMR cluster. Jobs will be submitted as a JAR or Pyspark files.

I have been able to successfully complete Step 1 but I am not sure how Step 2 is supposed to be done via CloudFormation.

I have been trying to look at a couple of examples on the AWS documentation and other sites but I could not see one where a spark job was being deployed via CloudFormation template.

Any examples or pointers in the right direction would be very helpful. Thanks in advance!

Upvotes: 4

Views: 2266

Answers (1)

zohaib ahmad
zohaib ahmad

Reputation: 39

Change your EMR Cloudformation script like that parameters section of EMR

StepScriptFilePath:
  Type: String
  Description: Step Scipt to run a bash script or add a java file here
  Default: 's3://s3-bucket/steps/step1.sh'
StepScriptFilePython:
  Type: String
  Description: Step Scipt to run a python file file
  Default: 's3://s3-bucket/steps/step2.py'
StepJar:
  Type: String
  Description: Spark jar file
  Default: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'

add this under EMR properties

  Steps:
    - ActionOnFailure: CONTINUE
      HadoopJarStep:
        Args:
          - Ref: StepScriptFile
        Jar:
          Ref: StepJar
        MainClass: ''
      Name: run any bash or java job in spark
   - ActionOnFailure: CONTINUE
      HadoopJarStep:
        Args:
          - "spark-submit"
          - Ref: StepScriptFilePython
        Jar: command-runner.jar
      Name: run a python script job

Upvotes: 4

Related Questions