ashic
ashic

Reputation: 6495

Spark Streaming standalone app and dependencies

I've got a scala spark streaming application that I'm running from inside IntelliJ. When I run against local[2], it runs fine. If I set the master to spark://masterip:port, then I get the following exception:

java.lang.ClassNotFoundException: RmqReceiver

I should add that I've got a custom receiver implemented in the same project called RmqReceiver. This is my app's code:

import akka.actor.{Props, ActorSystem}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext, SparkConf}

object Streamer {
  def main(args:Array[String]): Unit ={
    val conf = new SparkConf(true).setMaster("spark://192.168.40.2:7077").setAppName("Streamer")
    val sc = new SparkContext(conf)
    val ssc = new StreamingContext(sc, Seconds(2))
    val messages = ssc.receiverStream(new RmqReceiver(...))
    messages.print()
    ssc.start()
    ssc.awaitTermination()
  }
}

The RmqReceiver class is in the same scala folder as Streamer. I understand that using spark-submit with --jars for dependencies will likely make this work. Is there any way to get this working from inside the application?

Upvotes: 1

Views: 691

Answers (1)

Eugene Zhulenev
Eugene Zhulenev

Reputation: 9734

To run job on standalone spark cluster it need to know about all classes used in your applications. So you can add them to spark class path at startup, what is difficult and I don't suggest you to do that.

You need to package your application as uber-jar (compress all dependencies into single jar file) and then add it to SparkConf jars.

We use sbt-assembly plugin. If you're using maven, it has the same functionality with maven assembly

val sparkConf = new SparkConf().
    setMaster(config.getString("spark.master")).
    setJars(SparkContext.jarOfClass(this.getClass).toSeq)

I don't think that you can dp it from Intellij Idea, you definitely can do it as a part of sbt test phase.

Upvotes: 2

Related Questions