AWS EMR Multiple Jobs Dependency Contention

Question

Problem

I am attempting to run 2 pyspark steps in EMR both reading from Kinesis using KinesisUtils. This requires dependent library, spark-streaming-kinesis-asl_2.11.

I'm using Terraform to stand up the EMR cluster and invoke the steps both with args:

--packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.5

There appears to be contention on start up with both steps downloading the jar from maven and causing a checksum failure.

Things attempted

I've tried to move the download of the jar to the bootstrap bash script using:

sudo spark-shell --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.5

This causes problems as spark-shell is only available on the master node and bootstrap tries to run on all nodes.

I've tried to limit the above to only run on master using

grep-q'"isMaster":true'/mnt/var/lib/info/instance.json ||{echo "Not running on masternode,nothing further to do" && exit 0;}

That didn't seem to work.

I've attempted to add spark configuration to do this in EMR configuration.json

{

"Classification": "spark-defaults",

"Properties": {
```
"spark.jars.packages": "org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.5"
```
}

}

This also didn't work and seemed to stop all jars being copied to the master node dir

/home/hadoop/.ivy2/cache

What does work manually is logging onto the master node and running

sudo spark-shell --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.5

Then submitting the jobs manually without the --packages option.

Currently, all I need to do is manually start the failed jobs separately (clone steps in AWS console) and everything runs fine.

I just want to be able to start the cluster with all steps successfully starting, any help would be greatly appreciated.

srikanth holur · Accepted Answer

Download the required jars and upload them to s3.(One time)
While running your pyspark jobs from step, pass --jars in your spark-submit

AWS EMR Multiple Jobs Dependency Contention

Problem

Things attempted

Answers (1)

Related Questions