Reputation: 1
I am trying to configure Spark to interact with multiple catalogs defined in a single Hive Metastore (v3.0.0). Specifically, my Hive Metastore has two catalogs listed in the CTLGS
table: one named hive and another named rawtest. Here's an image of the CTLGS
table for reference:
CTLGS table from hive metastore.
From Spark (v3.4.3), I want to use the rawtest catalog by configuring my Spark session. However, when I attempt to create a schema or run queries, Spark always defaults to the hive catalog instead of using rawtest, even though I have specified rawtest in the configuration.
Here’s what happens:
spark.sql("use rawtest")
spark.sql("create schema test_schema")
But the new schema gets created under the hive catalog, not rawtest.
Additionally, when I try to switch catalogs:
spark.sql("use rawtest")
spark.sql("show schemas")
It still shows the schemas under the hive catalog. No matter what catalog name I specify in the Spark configuration, Spark always uses the hive catalog.
Here’s how I am configuring Spark to use the rawtest catalog:
spark = SparkSession.builder.appName("YourAppName") \
.config("spark.sql.catalog.rawtest", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.rawtest.type", "hive") \
.config("spark.sql.catalog.rawtest.uri", "thrift://mydomainname:port/datawarehouse") \
.config("spark.sql.catalog.rawtest.warehouse", "s3a://data-raw/") \
.config("spark.sql.defaultCatalog", "rawtest") \
.getOrCreate()
When I run the following command:
spark.sql("create schema rawtest.test_schema")
I expect the schema to be created under the rawtest catalog, as specified in the configuration. However, the schema gets created under the hive catalog instead.
Here's an image of the DBS table from the Hive Metastore where you can see the schema being created under hive:
I would like to understand why Spark keeps defaulting to the hive catalog and how to ensure that it uses the rawtest catalog as expected.
Upvotes: 0
Views: 143