Vivek
Vivek

Reputation: 13

Getting java.lang.ClassNotFoundException when I try to do spark-submit, referred other similar queries online but couldnt get it to work

I am new to Spark and am trying to run on a hadoop cluster a simple spark jar file built through maven in intellij. But I am getting classnotfoundexception in all the ways I tried to submit the application through spark-submit.

My pom.xml:

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.example</groupId>
<artifactId>SparkTrans</artifactId>
<version>1.0-SNAPSHOT</version>

<dependencies>
<!--https://mvnrepository.com/artifact/org.apache.spark/spark-core-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.3</version>
</dependency>
<!--https://mvnrepository.com/artifact/org.apache.spark/spark-sql-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.3</version>
</dependency>

<!--https://mvnrepository.com/artifact/org.apache.spark/spark-hive-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.3</version>
<scope>compile</scope>
</dependency>
<!--https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-slf4j-impl-->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.8</version>
<scope>test</scope>
</dependency>
<!--https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api-->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.8</version>
</dependency>
<!--https://mvnrepository.com/artifact/com.typesafe/config-->
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.4</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.11</artifactId>
<version>3.1.1</version>
<scope>test</scope>
</dependency>
</dependencies>


<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<id>shade-libs</id>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>resources/*</exclude>
</excludes>
</filter>
</filters>
<shadedClassifierName>fat</shadedClassifierName>
<shadedArtifactAttached>true</shadedArtifactAttached>
<relocations>
<relocation>
<pattern>org.apache.http.client</pattern>
<shadedPattern>shaded.org.apache.http.client</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>


</project>

My main scala object (SparkTrans.scala):

import common.InputConfig
import org.apache.spark.sql.{DataFrame,SparkSession}
importorg.slf4j.LoggerFactory

object SparkTrans{

private val logger=LoggerFactory.getLogger(getClass.getName)

def main(args:Array[String]):Unit={
try{
logger.info("main method started")
logger.warn("This is a warning")

val arg_length=args.length

if(arg_length==0){
logger.warn("No Argument passed")
System.exit(1)
}

val inputConfig:InputConfig=InputConfig(env=args(0),targetDB=args(1))
println("The first argument passed is" + inputConfig.env)
println("The second argument passed is" + inputConfig.targetDB)

val spark=SparkSession
.builder()
.appName("SparkPOCinside")
.config("spark.master","yarn")
.enableHiveSupport()
.getOrCreate()

println("Created Spark Session")

val sampleSeq=Seq((1,"Spark"),(2,"BigData"))

val df1=spark.createDataFrame(sampleSeq).toDF("courseid","coursename")
df1.show()


logger.warn("sql_test_a method started")
val courseDF=spark.sql("select * from MYINSTANCE.sql_test_a")
logger.warn("sql_test_a method ended")
courseDF.show()


}
catch{
case e:Exception=>
logger.error("An error has occurred in the main method" + e.printStackTrace())
}


}

}

I tried the below commands to spark-submit, but all of them give classnotfoundexception. I tried to switch the arguments around where I mention the --class right after --deploy-mode but in vain:

spark-submit --master yarn --deploy-mode cluster --queue ABCD --conf spark.yarn.security.tokens.hive.enabled=false --files hdfs://nameservice1/user/XMLs/hive-site.xml --keytab hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/lib/MYKEY.keytab --num-executors 1 --executor-cores 1 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=3072 --class org.example.SparkTrans hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/SparkTrans-1.0-SNAPSHOT-fat.jar dev somedb


spark-submit --master yarn --deploy-mode cluster --queue ABCD --conf spark.yarn.security.tokens.hive.enabled=false --files hdfs://nameservice1/user/XMLs/hive-site.xml --keytab hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/lib/MYKEY.keytab --num-executors 1 --executor-cores 1 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=3072 --class org.example.SparkTrans --name org.example.SparkTrans hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/SparkTrans-1.0-SNAPSHOT-fat.jar dev somedb


spark-submit --master yarn --deploy-mode cluster --queue ABCD --conf spark.yarn.security.tokens.hive.enabled=false --files hdfs://nameservice1/user/XMLs/hive-site.xml --keytab hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/lib/MYKEY.keytab --num-executors 1 --executor-cores 1 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=3072 --class SparkTrans hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/SparkTrans-1.0-SNAPSHOT-fat.jar dev somedb

Exact error I am getting:

btrace WARNING: No output stream. DataCommand output is ignored.
[main] INFO ResourceCollector - Unravel Sensor 4.6.1.8rc0013/2.0.3 initializing.
21/06/11 10:09:27 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
21/06/11 10:09:28 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1614625006458_6646161_000001
21/06/11 10:09:30 INFO spark.SecurityManager: Changing view acls to: MYKEY
21/06/11 10:09:30 INFO spark.SecurityManager: Changing modify acls to: MYKEY
21/06/11 10:09:30 INFO spark.SecurityManager: Changing view acls groups to: 
21/06/11 10:09:30 INFO spark.SecurityManager: Changing modify acls groups to: 
21/06/11 10:09:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(MYKEY); groups with view permissions: Set(); users  with modify permissions: Set(MYKEY); groups with modify permissions: Set()
21/06/11 10:09:30 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
21/06/11 10:09:30 ERROR yarn.ApplicationMaster: Uncaught exception: 
java.lang.ClassNotFoundException: SparkTrans
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:561)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:347)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:197)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:695)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:693)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
21/06/11 10:09:30 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.ClassNotFoundException: SparkTrans)
21/06/11 10:09:30 INFO util.ShutdownHookManager: Shutdown hook called

Can any of you let me know what I am doing wrong? I have checked and see the hive-site.xml and my jar are in the correct locations in hdfs as mentioned in my commands.

Upvotes: 0

Views: 1311

Answers (1)

roby
roby

Reputation: 3273

You need to add scala-compiler configuration to your pom.xml. The problem is without that there is nothing to compile your SparkTrans.scala file into java classes.

Add:

<project>
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>4.5.2</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

to your pom.xml and ensure your scala file is in src/main/scala

Then it should be compiled and added to your jar. Here's the documentation for the scala plugin.

You can check what's in your jar with jar tf jar-file, see guide here.

Upvotes: 1

Related Questions