Reputation: 893
I'm currently trying to execute some Scala code with Apache Spark on yarn(-client) mode against a Cloudera cluster, but the sbt run execution is aborted by the following Java Exception:
[error] (run-main-0) org.apache.spark.SparkException: YARN mode not available ?
org.apache.spark.SparkException: YARN mode not available ?
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:199)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
at SimpleApp$.main(SimpleApp.scala:7)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnClientClusterScheduler
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1261)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:199)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:100)
at SimpleApp$.main(SimpleApp.scala:7)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
[trace] Stack trace suppressed: run last compile:run for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
15/11/24 17:18:03 INFO network.ConnectionManager: Selector thread was interrupted!
[error] Total time: 38 s, completed 24-nov-2015 17:18:04
I suppose the prebuilt Apache Spark distribution is built with yarn support because if I try to execute a spark-submit (yarn-client) mode, there's no any java exception anymore, but yarn does not seem to allocate any resource as I get the same message every second : INFO Client: Application report for application_1448366262851_0022 (state: ACCEPTED). I suppose because of a configuration issue.
I googled this last message but I can't understand what's the yarn (nor where) configuration I have to modify to execute my program with spark on yarn.
Context:
Scala Test Program:
UPDATE
Well, the SBT job failed because hadoop-client.jar and spark-yarn.jar were not in the classpath when packaged and executed by SBT.
Now, sbt run is asking for an environment variable SPARK_YARN_APP_JAR and SPARK_JAR with my build.sbt configured like this :
name := "File Searcher"
version := "1.0"
scalaVersion := "2.10.4"
librearyDependencies += "org.apache.spark" %% "spark-core" % "0.9.1"
libraryDependencies += "org.apache.spark" %% "spark-yarn" % "0.9.1" % "runtime"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0" % "runtime"
libraryDependencies += "org.apache.hadoop" % "hadoop-yarn-client" % "2.6.0" % "runtime"
resolvers += "Maven Central" at "https://repo1.maven.org/maven2"
Is there any way to configure these variables "automatically"? I mean, I can set SPARK_JAR, as this jar came with the Spark installation, but SPARK_YARN_APP_JAR? When I set manually those variables, I notice the spark motor doesn't consider my custom configuration, even if I set the YARN_CONF_DIR variable. Is there a way to tell SBT to use my local Spark configuration to work?
If it can help, I let the current (ugly) code I'm executing :
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object SimpleApp {
def main(args: Array[String]) {
val logFile = "src/data/sample.txt"
val sc = new SparkContext("yarn-client", "Simple App", "C:/spark/lib/spark-assembly-1.3.0-hadoop2.4.0.jar",
List("target/scala-2.10/file-searcher_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numTHEs = logData.filter(line => line.contains("the")).count()
println("Lines with the: %s".format(numTHEs))
}
}
Thanks!
Thanks Cheloute
Upvotes: 0
Views: 1328
Reputation: 893
Well, I finally found what was my issue.
That's it. Everything else should work.
Upvotes: 0