JRhino
JRhino

Reputation: 99

How to get Spark in Java working - Could not initialize class org.apache.spark.util.Utils$

I am attempting to connect to a standalone spark server from a java application using the following code

SparkConf sparkConf_new = new SparkConf()
    .setAppName("Example Spark App")
    .setMaster("spark://my.server.com:7077");
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf_new);
JavaRDD<String> stringJavaRDD = sparkContext.textFile("hdfs://cluster/my/path/test.csv");
out.println("Number of lines in file = " + stringJavaRDD.count());

I am receiving the following error

An exception occurred at line 12

12: SparkConf sparkConf_new = new SparkConf()
13:     .setAppName("Example Spark App")
14:     .setMaster("spark://my.server.com:7077");
15: JavaSparkContext sparkContext = new JavaSparkContext(sparkConf_new);
16: JavaRDD<String> stringJavaRDD = sparkContext.textFile("hdfs://cluster/my/path/test.csv");
17: out.println("Number of lines in file = " + stringJavaRDD.count());

java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.util.Utils$
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:59)
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:53)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:123)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:54)

Included are:

scala-library-2.10.5.jar
spark-core_2.10-1.6.0.jar
hadoop-core-1.2.1.jar

Upvotes: 1

Views: 5307

Answers (2)

Brad
Brad

Reputation: 15879

You typically package your application into an Uber JAR file and use $SPARK_HOME/bin/spark-submit script to send it to the server for execution.

If you can try creating the most simple applicaiton to start with, using Maven all you should need in your project dependencies is

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
</dependency>

Doing it this way all of your environmental config (server url, etc) can be defined out side of your Java code in a script making it more portable.

Upvotes: 1

Mariusz
Mariusz

Reputation: 13926

If you write application in spark, even if you are sending it to remote cluster the three jars are not enough. You should add all spark dependencies to the classpath of the application.

The easiest way is to use maven or gradle (see http://spark.apache.org/docs/1.6.3/programming-guide.html#linking-with-spark) which will include spark and all its transitive dependencies. If you cannot use build system, download an official spark build and add all jars in jars/ directory to the classpath of your application.

Upvotes: 0

Related Questions