NITIN GUPTA
NITIN GUPTA

Reputation: 59

Error while using the Delta Lake source in Spark 2.4 (Hdinsight)

Getting below error , same code is working in Databricks but not in Hdinsight. I have added the delta library and hadoop-azure library also in the classpath.

io.delta:delta-core_2.11:0.5.0,org.apache.hadoop:hadoop-azure:3.1.3

ERROR ApplicationMaster [Driver]: User class threw exception: com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: com/fasterxml/jackson/module/scala/experimental/ScalaObjectMapper$class
com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: com/fasterxml/jackson/module/scala/experimental/ScalaObjectMapper$class
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4873)
    at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
    at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:712)
    at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:169)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at io.delta.tables.DeltaTable$.forPath(DeltaTable.scala:635)
    

Upvotes: 4

Views: 1393

Answers (2)

theDataNerd
theDataNerd

Reputation: 103

As mentioned by @blob, the error is a result of version conflict.

If you're using a maven based project, then you can easily configure your maven shade plugin to rename the Jackson related dependencies of delta so that the conflict is resolved.

<plugins>
... 
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.2.1</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
          <finalName>NAME-OF-YOUR-SHADED-JAR-FILE</finalName>
          <filters> <!-- exclude these files from artifacts to avoid SecurityException on signed jars -->
            <filter>
              <artifact>*:*</artifact>
              <excludes>
                <exclude>META-INF/LICENSE</exclude>
                <exclude>META-INF/*.SF</exclude>
                <exclude>META-INF/*.DSA</exclude>
                <exclude>META-INF/*.RSA</exclude>
              </excludes>
            </filter>
          </filters>
          <relocations> <!-- renames the packages so that delta uses these instead of provided jars -->
            <relocation>
              <pattern>com.fasterxml.jackson</pattern>
              <shadedPattern>noc.com.fasterxml.jackson</shadedPattern>
            </relocation>
            <relocation><!-- optional -->
              <pattern>com.google.guava</pattern>
              <shadedPattern>noc.com.google.guava</shadedPattern>
            </relocation>
          </relocations>
        </configuration>
      </execution>
    </executions>
  </plugin>
...
</plugins>

Also make sure that your pom.xml has these dependencies in given order:

 <!-- jackson related dependencies of delta -->
<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-core</artifactId>
  <version>2.6.7</version>
</dependency>
<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-databind</artifactId>
  <version>2.6.7.1</version>
</dependency>
<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-annotations</artifactId>
  <version>2.6.7</version>
</dependency>
<dependency>
  <groupId>com.fasterxml.jackson.module</groupId>
  <artifactId>jackson-module-scala_2.11</artifactId>
  <version>2.6.7.1</version>
</dependency>
<!-- /jackson related dependency of delta -->

<!-- delta -->
<!-- https://mvnrepository.com/artifact/io.delta/delta-core -->
<dependency>
  <groupId>io.delta</groupId>
  <artifactId>delta-core_2.11</artifactId>
  <version>0.6.1</version>
</dependency>
<!-- /delta -->

Upvotes: 0

Dhananjay
Dhananjay

Reputation: 3975

There is conflict between version of jackson-json libraries packaged with HDInsight and used by spark,deltalake

There are 2 options to get around this

  1. Packaged jackson json 2.6.7 version dependencies into your application (maven shade plugin or scala assembly)

Or

  1. Set below spark configurations, if you are using jupyter notebook
{"conf":
 {"spark.jars.packages": "io.delta:delta-core_2.11:0.5.0", 
    "spark.driver.extraClassPath":
     "${PATH}/jackson-module-scala_2.11-2.6.7.1.jar;${PATH}/jackson-annotations-2.6.7.jar;
      ${PATH}/jackson-core-2.6.7.jar;
      ${PATH}/jackson-databind-2.6.7.1.jar;
      ${PATH}/jackson-module-paranamer-2.6.7.jar",
   "spark.executor.extraClassPath":
     "${PATH}/jackson-module-scala_2.11-2.6.7.1.jar;${PATH}/jackson-annotations-2.6.7.jar;
      ${PATH}/jackson-core-2.6.7.jar;${PATH}/jackson-databind-2.6.7.1.jar;
      ${PATH}/jackson-module-paranamer-2.6.7.jar",
   "spark.driver.userClassPathFirst":true}}

Upvotes: 1

Related Questions