Subhadip Majumder
Subhadip Majumder

Reputation: 333

How to create SparkSession with Hive support (fails with "Hive classes are not found")?

I'm getting an error while trying to run the following code:

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class App {
  public static void main(String[] args) throws Exception {
    SparkSession
      .builder()
      .enableHiveSupport()
      .getOrCreate();        
  }
}

Output:

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
    at com.training.hivetest.App.main(App.java:21)

How can it be resolved?

Upvotes: 32

Views: 49884

Answers (10)

malanb5
malanb5

Reputation: 304

Ensure that you are running your jar via the spark-submit script:

${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>

which is a script that loads in the required classes and provides scala support before executing your jar.

Also as others have mentioned please ensure that you have loaded in the required dependency as well.

Example: Running a Spark Session

pom.xml
---
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-hive_2.11</artifactId>
  <version>2.4.4</version>
  <scope>compile</scope>
</dependency>

Test.java
---
SparkSession spark = SparkSession
    .builder()
    .appName("FeatureExtractor")
    .config("spark.master", "local")
    .config("spark.sql.hive.convertMetastoreParquet", false)
    .config("spark.submit.deployMode", "client")
    .config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
    .config("spark.sql.warehouse.dir", "/user/hive/warehouse")
    .config("hive.metastore.uris", "thrift://hivemetastore:9083")
    .enableHiveSupport()
    .getOrCreate();

So then to execute this code via Spark:

bin/spark-submit \
--class com.TestExample \
--executor-memory 1G \
--total-executor-cores 2 \
test.jar

Thank you to @lamber-ken who helped me with this issue.

For more information:

Spark Documentation: Submitting Applications

Exception Unable to instantiate SparkSession with Hive support because Hive classes are not found

Upvotes: 1

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

tl;dr You have to make sure that Spark SQL's spark-hive dependency and all transitive dependencies are available at runtime on the CLASSPATH of a Spark SQL application (not build time that is simply required for compilation only).


In other words, you have to have org.apache.spark.sql.hive.HiveSessionStateBuilder and org.apache.hadoop.hive.conf.HiveConf classes on the CLASSPATH of the Spark application (which has little to do with sbt or maven).

The former HiveSessionStateBuilder is part of spark-hive dependency (incl. all the transitive dependencies).

The latter HiveConf is part of hive-exec dependency (that is a transitive dependency of the above spark-hive dependency).

Upvotes: 3

ganesh hegde
ganesh hegde

Reputation: 73

In my case, I had to check

Include dependencies with "Provided" scope

under my Run/Debug Configuration in intellij

Upvotes: 1

Deepesh Rehi
Deepesh Rehi

Reputation: 927

While all the top answers are correct, and still you are facing issues, then remember the error described in the question can still occur even though you have mentioned the jars in your pom.

In order to resolve this issue, please make sure the version of all your dependencies should be same and as a standard practice, maintain a global variable for spark version and scala version, and substitute these values to avoid any conflict due to different versions.

Just for the reference:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.xxx.rehi</groupId>
    <artifactId>Maven9211</artifactId>
    <version>1.0-SNAPSHOT</version>
<properties>
    <scala.version>2.12</scala.version>
    <spark.version>2.4.4</spark.version>
</properties>


<dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>


    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>


</dependencies>
</project> 

Upvotes: 1

Sachin Patil
Sachin Patil

Reputation: 86

For SBT Use
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive

libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0"


We have used Spark-Core-2.1.0 and Spark-SQL-2.1.0

Upvotes: 1

Kevin Lawrence
Kevin Lawrence

Reputation: 62

[Updating my Answer] This answer on StackOverflow is right - answer link.

I also faced issues building and running Spark with HiveSupport. Based on the above answer I did the following in my Spark 2.12.8 project.

  1. Updated my build.sbt to the below content
  2. Manually removed the files in .idea/libraries
  3. Clicked 'Refresh all sbt projects' button in SBT Shell window (I am using intellij)

I can now run the project without any issues.

libraryDependencies += "junit" % "junit" % "4.12" % Test
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.4.2",
  "org.apache.spark" %% "spark-sql" % "2.4.2",
  "org.apache.spark" %% "spark-hive" % "2.4.2" % "provided",
  "org.scalatest" %% "scalatest" % "3.0.3" % Test
)

Upvotes: 0

Harry Nguyen
Harry Nguyen

Reputation: 44

My full list of dependencies for Spark 2.4.1 is here

  <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.12</artifactId>
      <version>2.4.1</version>
  </dependency>

  <dependency>
      <groupId>org.apache.calcite</groupId>
      <artifactId>calcite-avatica</artifactId>
      <version>1.6.0</version>
  </dependency>
  <dependency>
      <groupId>org.apache.calcite</groupId>
      <artifactId>calcite-core</artifactId>
      <version>1.12.0</version>
  </dependency>
  <dependency>
      <groupId>org.spark-project.hive</groupId>
      <artifactId>hive-exec</artifactId>
      <version>1.2.1.spark2</version>
  </dependency>
  <dependency>
      <groupId>org.spark-project.hive</groupId>
      <artifactId>hive-metastore</artifactId>
      <version>1.2.1.spark2</version>
  </dependency>
  <dependency>
      <groupId>org.codehaus.jackson</groupId>
      <artifactId>jackson-mapper-asl</artifactId>
      <version>1.9.13</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
  <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-core</artifactId>
      <version>2.6.7</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
  <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>2.6.7.1</version>
  </dependency>


  <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-annotations -->
  <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
      <version>2.6.7</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.codehaus.janino/janino -->
  <dependency>
      <groupId>org.codehaus.janino</groupId>
      <artifactId>janino</artifactId>
      <version>3.0.9</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler -->
  <dependency>
      <groupId>org.codehaus.janino</groupId>
      <artifactId>commons-compiler</artifactId>
      <version>3.0.9</version>
  </dependency>

Upvotes: -1

Sruthi Poddutur
Sruthi Poddutur

Reputation: 1491

I had the same problem. I could resolve it by adding following dependencies. (I resolved this list by referring compile dependencies section of spark-hive_2.11 mvn repository page):

 <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-avatica</artifactId>
            <version>1.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-core</artifactId>
            <version>1.12.0</version>
        </dependency>
        <dependency>
            <groupId>org.spark-project.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1.spark2</version>
        </dependency>
        <dependency>
            <groupId>org.spark-project.hive</groupId>
            <artifactId>hive-metastore</artifactId>
            <version>1.2.1.spark2</version>
        </dependency>
        <dependency>
            <groupId>org.codehaus.jackson</groupId>
            <artifactId>jackson-mapper-asl</artifactId>
            <version>1.9.13</version>
        </dependency>

where scala.binary.version = 2.11 and spark.version = 2.1.0

 <properties>
      <scala.binary.version>2.11</scala.binary.version>
      <spark.version>2.1.0</spark.version>
    </properties>

Upvotes: 2

xuchuanyin
xuchuanyin

Reputation: 29

I've looked into the source code, and found that despite HiveSessionState(in spark-hive), another class HiveConf is also needed to initiate SparkSession. And HiveConf is not contained in spark-hive*jar, maybe you can find it in hive related jars and put it in your classpath.

Upvotes: 2

abaghel
abaghel

Reputation: 15297

Add following dependency to your maven project.

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.0.0</version>
</dependency>

Upvotes: 40

Related Questions