Reputation: 383
I cannot configure Spark SQL so that I could access Hive Table in Spark Thrift Server (without using JDBC, but natively from Spark)
I use single configuration file conf/hive-site.xml
for both Spark Thrift Server and Spark SQL. I have javax.jdo.option.ConnectionURL
property set to jdbc:derby:;databaseName=/home/user/spark-2.4.0-bin-hadoop2.7/metastore_db;create=true
. I also set spark.sql.warehouse.dir
property to absolute path pointing to spark-warehouse
directory. I run Thrift server with ./start-thriftserver.sh
and I can observe that embedded Derby database is being created with metastore_db
directory. I can connect with beeline
, create a table and see spark-warehouse
directory created with subdirectory for table. So at this stage it's fine.
I launch pyspark shell with Hive support enabled ./bin/pyspark --conf spark.sql.catalogImplementation=hive
, and try to access the Hive table with:
from pyspark.sql import HiveContext
hc = HiveContext(sc)
hc.sql('show tables')
I got errors like:
ERROR XJ040: Failed to start database '/home/user/spark-2.4.0-bin-hadoop2.7/metastore_db' with class loader sun.misc.Launcher$AppClassLoader@1b4fb997
ERROR XSDB6: Another instance of Derby may have already booted the database /home/user/spark-2.4.0-bin-hadoop2.7/metastore_db
pyspark.sql.utils.AnalysisException: u'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
Apparently Spark is trying to create new Derby database instead of using Metastore I put in config file. If I stop Thrift Server and run only spark, everything is fine. How could I fix it?
Is Embedded Derby Metastore Database fine to have both Thrift Server and Spark access one Hive or I need to use e.g. MySQL? I don't have a cluster and do everything locally.
Upvotes: 0
Views: 1565
Reputation: 1
Embedded Derby Metastore Database is fine to be used in local, but for production environment, it is recommended to use any other Metastore database.
Yes, you can definitely use MYSQL as metastore. For this, you have to make an entry in hive-site.xml
.
You can follow the configuration guide at Use MySQL for the Hive Metastore for the exact details.
Upvotes: 0