Reputation: 333
I'm getting an error while trying to run the following code:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class App {
public static void main(String[] args) throws Exception {
SparkSession
.builder()
.enableHiveSupport()
.getOrCreate();
}
}
Output:
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
at com.training.hivetest.App.main(App.java:21)
How can it be resolved?
Upvotes: 32
Views: 49884
Reputation: 304
Ensure that you are running your jar via the spark-submit script:
${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>
which is a script that loads in the required classes and provides scala support before executing your jar.
Also as others have mentioned please ensure that you have loaded in the required dependency as well.
Example: Running a Spark Session
pom.xml
---
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.4</version>
<scope>compile</scope>
</dependency>
Test.java
---
SparkSession spark = SparkSession
.builder()
.appName("FeatureExtractor")
.config("spark.master", "local")
.config("spark.sql.hive.convertMetastoreParquet", false)
.config("spark.submit.deployMode", "client")
.config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
.config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.config("hive.metastore.uris", "thrift://hivemetastore:9083")
.enableHiveSupport()
.getOrCreate();
So then to execute this code via Spark:
bin/spark-submit \
--class com.TestExample \
--executor-memory 1G \
--total-executor-cores 2 \
test.jar
Thank you to @lamber-ken who helped me with this issue.
For more information:
Spark Documentation: Submitting Applications
Exception Unable to instantiate SparkSession with Hive support because Hive classes are not found
Upvotes: 1
Reputation: 74619
tl;dr You have to make sure that Spark SQL's spark-hive
dependency and all transitive dependencies are available at runtime on the CLASSPATH of a Spark SQL application (not build time that is simply required for compilation only).
In other words, you have to have org.apache.spark.sql.hive.HiveSessionStateBuilder
and org.apache.hadoop.hive.conf.HiveConf
classes on the CLASSPATH of the Spark application (which has little to do with sbt or maven).
The former HiveSessionStateBuilder
is part of spark-hive
dependency (incl. all the transitive dependencies).
The latter HiveConf
is part of hive-exec
dependency (that is a transitive dependency of the above spark-hive
dependency).
Upvotes: 3
Reputation: 73
In my case, I had to check
Include dependencies with "Provided" scope
under my Run/Debug Configuration in intellij
Upvotes: 1
Reputation: 927
While all the top answers are correct, and still you are facing issues, then remember the error described in the question can still occur even though you have mentioned the jars in your pom.
In order to resolve this issue, please make sure the version of all your dependencies should be same and as a standard practice, maintain a global variable for spark version and scala version, and substitute these values to avoid any conflict due to different versions.
Just for the reference:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xxx.rehi</groupId>
<artifactId>Maven9211</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.12</scala.version>
<spark.version>2.4.4</spark.version>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
</project>
Upvotes: 1
Reputation: 86
For SBT Use
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0"
We have used Spark-Core-2.1.0 and Spark-SQL-2.1.0
Upvotes: 1
Reputation: 62
[Updating my Answer] This answer on StackOverflow is right - answer link.
I also faced issues building and running Spark with HiveSupport. Based on the above answer I did the following in my Spark 2.12.8 project.
I can now run the project without any issues.
libraryDependencies += "junit" % "junit" % "4.12" % Test
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.2",
"org.apache.spark" %% "spark-sql" % "2.4.2",
"org.apache.spark" %% "spark-hive" % "2.4.2" % "provided",
"org.scalatest" %% "scalatest" % "3.0.3" % Test
)
Upvotes: 0
Reputation: 44
My full list of dependencies for Spark 2.4.1 is here
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>2.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-avatica</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.6.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.6.7.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-annotations -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.6.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.codehaus.janino/janino -->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.9</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler -->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
<version>3.0.9</version>
</dependency>
Upvotes: -1
Reputation: 1491
I had the same problem. I could resolve it by adding following dependencies. (I resolved this list by referring compile dependencies section of spark-hive_2.11 mvn repository page):
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-avatica</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
where scala.binary.version = 2.11 and spark.version = 2.1.0
<properties>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>2.1.0</spark.version>
</properties>
Upvotes: 2
Reputation: 29
I've looked into the source code, and found that despite HiveSessionState(in spark-hive), another class HiveConf is also needed to initiate SparkSession. And HiveConf is not contained in spark-hive*jar, maybe you can find it in hive related jars and put it in your classpath.
Upvotes: 2
Reputation: 15297
Add following dependency to your maven project.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
Upvotes: 40