AWS Glue Iceberg "Failed to connect to Hive Metastore" - but I'm not using Hive

Question

I'm trying to create an AWS Glue job to test Apache Iceberg. I'm using the default tutorial here. I am getting the error "Failed to connect to Hive Metastore".

Other posts on stackOverflow with this error, they want Hive. I do not want Hive. I want to use the AWS Glue Catalog. I have zero references to hive anywhere in my script. Why is AWS glue still looking for Hive?

Here's my code:

from pyspark.sql import SparkSession
from pyspark import SparkConf, SparkContext
from awsglue.context import GlueContext

DB_NAME='default'
CATALOG_NAME="glue_catalog"  #The AWS Glue Data Catalog is pre-configured for use by the Spark libraries as glue_catalog.  
TABLE_NAME = "table1"

conf = (SparkConf().setAppName("Spark Test") 
    .set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
    .set(f"spark.sql.catalog.{CATALOG_NAME}", "org.apache.iceberg.spark.SparkCatalog")  #this does not enable iceberg support?
    .set(f"spark.sql.catalog.{CATALOG_NAME}.warehouse", "s3://")
    .set(f"spark.sql.catalog.{CATALOG_NAME}.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") 
    .set(f"spark.sql.catalog.{CATALOG_NAME}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
    .set(f"spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")  
    )


#create spark runner
spark = ( SparkSession 
    .builder 
   .appName("Python Spark Iceberg example") 
   .config(conf=conf) 
   .getOrCreate() 
        )
        
glueContext = GlueContext(spark.sparkContext.getOrCreate())  #not sure what this is for but I tried it, no difference seen.

#directly from https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/iceberg-spark.html
glueContext.sql(f"""
    CREATE TABLE IF NOT EXISTS {CATALOG_NAME}.{DB_NAME}.{TABLE_NAME}_nopartitions (
        c_customer_sk             int,
        c_customer_id             string,
        c_first_name              string,
        c_last_name               string,
        c_birth_country           string,
        c_email_address           string)
    USING iceberg
    OPTIONS ('format-version'='2')
""")

#### ERROR "Failed to connect to Hive Metastore"  #####
#######################################################

glueContext.sql(f"""
INSERT INTO {CATALOG_NAME}.{DB_NAME}.{TABLE_NAME}_nopartitions
SELECT c_customer_sk, c_customer_id, c_first_name, c_last_name, c_birth_country, c_email_address
FROM another_table
""")

Error logs for those who ask:

2025-01-30 20:53:44,898 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(77)): Error from Python:Traceback (most recent call last):
  File "/tmp/main.py", line 85, in 
    spark.sql(f"""
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self)
  File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
    return f(*a, **kw)
  File "/opt/amazon/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o104.sql.
: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
    at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84)

what I tried:

--enable-glue-datacatalog = true did nothing
.enableHiveSupport() did nothing
setting catalog to SparkSessionCatalog
tried both glueSession.sql and spark.sql everywhere
multiple stackOverflow posts, but they all use Hive
reading lots of documentation

I'm stumped. I'm not using Hive. What's going on?

Any insight is appreciated.

AWS Glue Iceberg "Failed to connect to Hive Metastore" - but I'm not using Hive

Answers (0)

Related Questions

AWS Glue Iceberg &quot;Failed to connect to Hive Metastore&quot; - but I&#39;m not using Hive

Answers (0)

Related Questions

AWS Glue Iceberg "Failed to connect to Hive Metastore" - but I'm not using Hive