awenclaw
awenclaw

Reputation: 543

updateStateByKey, noClassDefFoundError

I have problem with using updateStateByKey() function. I have following, simple code (written base on book: "Learning Spark - Lighting Fast Data Analysis"):

object hello {
  def updateStateFunction(newValues: Seq[Int], runningCount: Option[Int]): Option[Int] = {
    Some(runningCount.getOrElse(0) + newValues.size)
  }


  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[5]").setAppName("AndrzejApp")
    val ssc = new StreamingContext(conf, Seconds(4))
    ssc.checkpoint("/")

    val lines7 = ssc.socketTextStream("localhost", 9997)
    val keyValueLine7 = lines7.map(line => (line.split(" ")(0), line.split(" ")(1).toInt))


    val statefullStream = keyValueLine7.updateStateByKey(updateStateFunction _)
    ssc.start()
    ssc.awaitTermination()
  }

}

My build.sbt is:

name := "stream-correlator-spark"

version := "1.0"

scalaVersion := "2.11.4"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.3.1" % "provided",
  "org.apache.spark" %% "spark-streaming" % "1.3.1" % "provided"
)

When I build it with sbt assembly command everything goes fine. When I run this on spark cluster in local mode I got error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/dstream/DStream$ at hello$.main(helo.scala:25) ...

line 25 is:

val statefullStream = keyValueLine7.updateStateByKey(updateStateFunction _)

I feel this might be some compatibility version problem but I don't know what might be the reason and how to resolve this.

I would be really grateful for help!

Upvotes: 0

Views: 139

Answers (2)

Wesley Miao
Wesley Miao

Reputation: 861

You can add "provided" back when you need to submit your app to a spark cluster to run. The benefit of having "provided" is that the result fat jar will not include classes from the provided dependencies, which will yield a much smaller fat jar, comparing to not having "provided". In my case, the result jar will be around 90M without "provided" and then shrink to 30+M with "provided".

Upvotes: 0

Odomontois
Odomontois

Reputation: 16328

When you are writing "provided" in the SBT this means exactly that your library is provided by the environment and need no to be included in the package. Try to remove "provided" mark from "spark-streaming" library.

Upvotes: 1

Related Questions