Reputation: 5890
The same code can be run on Spark standalone, but it failed on Yarn when I ran spark on Yarn. The exception was: java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.common.xcontent.json.JsonXContent
which was threw in Executor(Yarn Container). But I did included the elasticSearch jar in the application assembly jar when I used maven assembly. The run command as following:
spark-submit --executor-memory 10g --executor-cores 2 --num-executors 2
--queue thejob --master yarn --class com.batch.TestBat /lib/batapp-mr.jar 2016-12-20
The maven dependencies as following please:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_2.10</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.6.3</version>
<!-- <scope>provided</scope> -->
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.0-cdh5.7.0</version>
<!--<scope>provided</scope> -->
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.0-cdh5.7.0</version>
<!--<scope>provided</scope> -->
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.2.0-cdh5.7.0</version>
<!--<scope>provided</scope> -->
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-hadoop2-compat</artifactId>
<version>1.2.0-cdh5.7.0</version>
<!--<scope>provided</scope> -->
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.2.0-cdh5.7.0</version>
<!--<scope>provided</scope> -->
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-hadoop-compat</artifactId>
<version>1.2.0-cdh5.7.0</version>
<!--<scope>provided</scope> -->
</dependency>
<dependency>
<groupId>com.sksamuel.elastic4s</groupId>
<artifactId>elastic4s-core_2.10</artifactId>
<version>2.3.0</version>
<!--<scope>provided</scope> -->
<exclusions>
<exclusion>
<artifactId>elasticsearch</artifactId>
<groupId>org.elasticsearch</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>2.3.1</version>
<exclusions>
<exclusion>
<artifactId>log4j-over-slf4j</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
The weird thing is that the Executor could find Hbase jar and ElasticSearch jar which both included in dependencies, but not ElasticSearch some classes, So I guess might some class conflicts. I checked the assembly jar it did included the "missing classes".
Upvotes: 2
Views: 501
Reputation: 29237
I can see, You have already included the jar dependency.
Also you have commented dependency provided
means it will be packed and same thing is available for your deployment.
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.6.3</version>
</dependency>
Only thing I suspect/sure is spark submit please check like below.
--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
--conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*" \
--conf "spark.driver.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')
--conf "spark.executor.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')
where your directory of jars is extracted lib from your distribution.
You can also print Classpath like below from your program
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
EDIT : after executing above lines, if you find old duplicate jar present in your class path, then include your libraries with your
app or using --jars
, but also try setting
spark.{driver,executor}.userClassPathFirst
totrue
Upvotes: 2