Amir Bashir
Amir Bashir

Reputation: 1

Spark Using Wrong Catalog in Hive Metastore: How to Use a Specific Catalog Instead of Default 'hive'?

I am trying to configure Spark to interact with multiple catalogs defined in a single Hive Metastore (v3.0.0). Specifically, my Hive Metastore has two catalogs listed in the CTLGS table: one named hive and another named rawtest. Here's an image of the CTLGS table for reference:

CTLGS table from hive metastore.

From Spark (v3.4.3), I want to use the rawtest catalog by configuring my Spark session. However, when I attempt to create a schema or run queries, Spark always defaults to the hive catalog instead of using rawtest, even though I have specified rawtest in the configuration.

Here’s what happens:

  1. I set up my Spark session with the rawtest catalog configuration
  2. I run the following commands:
spark.sql("use rawtest")
spark.sql("create schema test_schema")

But the new schema gets created under the hive catalog, not rawtest.

Additionally, when I try to switch catalogs:

spark.sql("use rawtest")
spark.sql("show schemas")

It still shows the schemas under the hive catalog. No matter what catalog name I specify in the Spark configuration, Spark always uses the hive catalog.

Here’s how I am configuring Spark to use the rawtest catalog:

spark = SparkSession.builder.appName("YourAppName") \
    .config("spark.sql.catalog.rawtest", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.rawtest.type", "hive") \
    .config("spark.sql.catalog.rawtest.uri", "thrift://mydomainname:port/datawarehouse") \
    .config("spark.sql.catalog.rawtest.warehouse", "s3a://data-raw/") \
    .config("spark.sql.defaultCatalog", "rawtest") \
    .getOrCreate()

When I run the following command:

spark.sql("create schema rawtest.test_schema")

I expect the schema to be created under the rawtest catalog, as specified in the configuration. However, the schema gets created under the hive catalog instead.

Here's an image of the DBS table from the Hive Metastore where you can see the schema being created under hive:

DBS table from Hive-Metastore

I would like to understand why Spark keeps defaulting to the hive catalog and how to ensure that it uses the rawtest catalog as expected.

Upvotes: 0

Views: 143

Answers (0)

Related Questions