How to resolve Spark library conflict with Cloudera CDH 5.8.0 virtual box

Question

I am trying to submit a job to spark in Cloudera CDH 5.8.0 virtual box, and I am using json library, and I use also maven-shade plugin to include the dependency to jar file, following is my pom:


    4.0.0

    com.example
    spark
    0.0.1-SNAPSHOT
    jar

    

        
            org.apache.spark
            spark-core_2.11
            1.5.1
            provided
        

        
            org.json
            json
            20160810
        

    

    
        
            
                org.apache.maven.plugins
                maven-compiler-plugin
                2.3.2
                
                    1.8
                    1.8
                
            
            
                org.apache.maven.plugins
                maven-shade-plugin
                2.3
                
                    
                        package
                        
                            shade
                        
                    
                
                
                    
                        
                            *:*
                            
                                META-INF/*.SF
                                META-INF/*.DSA
                                META-INF/*.RSA
                            
                        
                    
                    uber-${project.artifactId}-${project.version}

Submit command is:

spark-submit --class com.example.spark.SparkParser --master local[*] uber-spark-0.0.1-SNAPSHOT.jar

And I keep getting following exception:

Exception in thread "main" java.lang.NoSuchMethodError:
org.json.JSONTokener.(Ljava/io/InputStream;)

I found a small following code that can tell from which library the class is loaded:

ClassLoader classloader = org.json.JSONTokener.class.getClassLoader();
URL res = classloader.getResource("org/json/JSONTokener.class");
String path = res.getPath();
System.out.println("Core JSONTokener came from " + path);

And the output is as the following:

Core JSONTokener came from file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.8.0.jar!/org/json/JSONTokener.class

I can find the file locally in the virtual box of CDH as following:

[cloudera@quickstart ~]$ ls -l /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.8.0.jar
-rw-r--r-- 1 root root 19306194 Jun 16  2016 /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.8.0.jar

I even tried to make the json library as 'provided' to exclude it from my jar file, but still the same Error.

I tried to remove the local jar file named: /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.8.0.jar And my code works correctly, but I am not sure this is the correct solution, and if removing this library would hurt cloudera somehow.

So, how can I tell spark not use this local jar file, and use the one included inside my 'uber-spark-0.0.1-SNAPSHOT.jar' file ?

How to resolve Spark library conflict with Cloudera CDH 5.8.0 virtual box

Answers (1)

Related Questions