Reputation: 145
I am trying to integrate Apache Spark with Hive in a multi-node cluster setup. My setup consists of the following machines:
Everything works fine, and I can even create a Spark session from my local environment to the production machines using "thrift://192.XXX.01.04:9863".
On my Hive machine (192.XXX.01.04), I start the required services using:
hive --service metastore
hive --service hiveserver2
Configurations
hive-env.sh
export HADOOP_HOME=/path/to/hadoop
export HIVE_CONF_DIR=/path/to/apache-hive-3.1.2-bin/conf
export SPARK_HOME=/path/to/Spark
export SPARK_JARS=$(echo $SPARK_HOME/jars/*.jar | tr ' ' ',')
export HIVE_AUX_JARS_PATH=$SPARK_JARS
hive-site.xml
(Important properties)
<configuration>
<property>
<name>hive.metastore.uri</name>
<value>thrift://192.XXX.01.04:9083</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>yarn</value> <!-- or local[*] for local mode -->
</property>
<property>
<name>spark.submit.deployMode</name>
<value>client</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
</configuration>
Spark JARs Included
Due to conflicts with Hive and Hadoop libraries, I only kept the essential Spark JARs, including:
- spark-hive_2.12-3.4.2.jar
- spark-hive-thriftserver_2.12-3.4.2.jar
- spark-sql_2.12-3.4.2.jar
- mysql-connector-j-8.0.31.jar
Issue
Whenever I run a simple query like:
SELECT COUNT(*) FROM my_table;
I get the following error:
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session dd1bae4e-bbb9-440c-a29f-68968e6b0421)'
FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
What I've Tried
Questions
Any help would be greatly appreciated!
Upvotes: 0
Views: 19