gprivitera
gprivitera

Reputation: 943

How to create a Spark Streaming jar that would work in AWS EMR?

I've been developing a Spark Streaming application with Eclipse, and I'm using sbt to run it locally.

Now I want to deploy the application on AWS using a jar, but when I try to use the command package of sbt it creates a jar without all dependencies so when I upload it on AWS it won't work because of Scala being missing.

Is there a way to create a uber-jar with SBT? Am I doing something wrong with the deployment of Spark on AWS?

Upvotes: 1

Views: 2105

Answers (2)

prabeesh
prabeesh

Reputation: 945

For creating uber-jar with sbt, use sbt plugin sbt-assembly. For more details about creating uber-jar using sbt-assembly refer the blog post

After creating you can run the assembly jar using java -jar command.

But from Spark-1.0.0 onwards the spark-submit script in Spark’s bin directory is used to launch applications on a cluster for more details refer here

Upvotes: 2

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

You should really be following Running Spark on EC2 that reads:

The spark-ec2 script, located in Spark’s ec2 directory, allows you to launch, manage and shut down Spark clusters on Amazon EC2. It automatically sets up Spark, Shark and HDFS on the cluster for you. This guide describes how to use spark-ec2 to launch clusters, how to run jobs on them, and how to shut them down. It assumes you’ve already signed up for an EC2 account on the Amazon Web Services site.

I've only partially followed the document so I can't comment on how well it's written.

Moreover, according to Shipping Code to the Cluster chapter in the other document:

The recommended way to ship your code to the cluster is to pass it through SparkContext’s constructor, which takes a list of JAR files (Java/Scala) or .egg and .zip libraries (Python) to disseminate to worker nodes. You can also dynamically add new files to be sent to executors with SparkContext.addJar and addFile.

Upvotes: 0

Related Questions