Juili Chormole
Juili Chormole

Reputation: 1

AWS Glue Access Issues on Spark on EKS

I am working on deploying Spark 3.5.3 on EKS using the Spark Operator and trying to connect it with AWS Glue Catalog. However, I’m facing an issue where SHOW DATABASES only lists the default database, and it looks like Spark is still relying on a local Derby metastore instead of Glue. Spark_driver logs are as follows:

2025-02-12T08:37:12,463 WARN [main] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2025-02-12T08:37:14,609 INFO [Thread-6] org.apache.spark.SparkContext - Running Spark version 3.5.3
2025-02-12T08:37:14,610 INFO [Thread-6] org.apache.spark.SparkContext - OS info Linux, 6.1.124, amd64
2025-02-12T08:37:14,612 INFO [Thread-6] org.apache.spark.SparkContext - Java version 11.0.24
2025-02-12T08:37:14,646 INFO [Thread-6] org.apache.spark.resource.ResourceUtils - ==============================================================
2025-02-12T08:37:14,646 INFO [Thread-6] org.apache.spark.resource.ResourceUtils - No custom resources configured for spark.driver.
2025-02-12T08:37:14,648 INFO [Thread-6] org.apache.spark.resource.ResourceUtils - ==============================================================
2025-02-12T08:37:14,648 INFO [Thread-6] org.apache.spark.SparkContext - Submitted application: SparkWithGlue
2025-02-12T08:37:14,668 INFO [Thread-6] org.apache.spark.resource.ResourceProfile - Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
2025-02-12T08:37:14,678 INFO [Thread-6] org.apache.spark.resource.ResourceProfile - Limiting resource is cpus at 1 tasks per executor
2025-02-12T08:37:14,681 INFO [Thread-6] org.apache.spark.resource.ResourceProfileManager - Added ResourceProfile id: 0
2025-02-12T08:37:14,736 INFO [Thread-6] org.apache.spark.SecurityManager - Changing view acls to: spark,hadoop
2025-02-12T08:37:14,737 INFO [Thread-6] org.apache.spark.SecurityManager - Changing modify acls to: spark,hadoop
2025-02-12T08:37:14,738 INFO [Thread-6] org.apache.spark.SecurityManager - Changing view acls groups to: 
2025-02-12T08:37:14,738 INFO [Thread-6] org.apache.spark.SecurityManager - Changing modify acls groups to: 
2025-02-12T08:37:14,739 INFO [Thread-6] org.apache.spark.SecurityManager - SecurityManager: authentication enabled; ui acls disabled; users with view permissions: spark, hadoop; groups with view permissions: EMPTY; users with modify permissions: spark, hadoop; groups with modify permissions: EMPTY
2025-02-12T08:37:14,973 INFO [Thread-6] org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 7078.
2025-02-12T08:37:15,007 INFO [Thread-6] org.apache.spark.SparkEnv - Registering MapOutputTracker
2025-02-12T08:37:15,044 INFO [Thread-6] org.apache.spark.SparkEnv - Registering BlockManagerMaster
2025-02-12T08:37:15,062 INFO [Thread-6] org.apache.spark.storage.BlockManagerMasterEndpoint - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2025-02-12T08:37:15,063 INFO [Thread-6] org.apache.spark.storage.BlockManagerMasterEndpoint - BlockManagerMasterEndpoint up
2025-02-12T08:37:15,066 INFO [Thread-6] org.apache.spark.SparkEnv - Registering BlockManagerMasterHeartbeat
2025-02-12T08:37:15,084 INFO [Thread-6] org.apache.spark.storage.DiskBlockManager - Created local directory at /var/data/spark-a1ec6bec-ba09-440d-8e92-a13799ad0252/blockmgr-a41424d8-e84a-441f-ab67-32ec75cf3a59
2025-02-12T08:37:15,105 INFO [Thread-6] org.apache.spark.storage.memory.MemoryStore - MemoryStore started with capacity 413.9 MiB
2025-02-12T08:37:15,118 INFO [Thread-6] org.apache.spark.SparkEnv - Registering OutputCommitCoordinator
2025-02-12T08:37:15,150 INFO [Thread-6] org.sparkproject.jetty.util.log - Logging initialized @4708ms to org.sparkproject.jetty.util.log.Slf4jLog
2025-02-12T08:37:15,226 INFO [Thread-6] org.apache.spark.ui.JettyUtils - Start Jetty 0.0.0.0:4040 for SparkUI
2025-02-12T08:37:15,236 INFO [Thread-6] org.sparkproject.jetty.server.Server - jetty-9.4.54.v20240208; built: 2024-02-08T19:42:39.027Z; git: cef3fbd6d736a21e7d541a5db490381d95a2047d; jvm 11.0.24+8
2025-02-12T08:37:15,253 INFO [Thread-6] org.sparkproject.jetty.server.Server - Started @4812ms
2025-02-12T08:37:15,285 INFO [Thread-6] org.sparkproject.jetty.server.AbstractConnector - Started ServerConnector@2fd336e4{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2025-02-12T08:37:15,285 INFO [Thread-6] org.apache.spark.util.Utils - Successfully started service 'SparkUI' on port 4040.
2025-02-12T08:37:15,313 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@27f5f3c6{/,null,AVAILABLE,@Spark}
2025-02-12T08:37:15,389 INFO [Thread-6] org.apache.spark.deploy.k8s.SparkKubernetesClientFactory - Auto-configuring K8S client using current context from users K8S config file
2025-02-12T08:37:16,337 INFO [kubernetes-executor-snapshots-subscribers-0] org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator - Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
2025-02-12T08:37:16,377 INFO [kubernetes-executor-snapshots-subscribers-0] org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator - Found 0 reusable PVCs from 0 PVCs
2025-02-12T08:37:16,499 INFO [kubernetes-executor-snapshots-subscribers-0] org.apache.spark.deploy.k8s.features.BasicExecutorFeatureStep - Decommissioning not enabled, skipping shutdown script
2025-02-12T08:37:16,694 INFO [Thread-6] org.apache.spark.util.Utils - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
2025-02-12T08:37:16,695 INFO [Thread-6] org.apache.spark.network.netty.NettyBlockTransferService - Server created on spark-sql-job-juili-ee0a0a94f94e050b-driver-svc.spark-emr-operator.svc 10.50.58.179:7079
2025-02-12T08:37:16,697 INFO [Thread-6] org.apache.spark.storage.BlockManager - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2025-02-12T08:37:16,703 INFO [Thread-6] org.apache.spark.storage.BlockManagerMaster - Registering BlockManager BlockManagerId(driver, spark-sql-job-juili-ee0a0a94f94e050b-driver-svc.spark-emr-operator.svc, 7079, None)
2025-02-12T08:37:16,705 INFO [dispatcher-BlockManagerMaster] org.apache.spark.storage.BlockManagerMasterEndpoint - Registering block manager spark-sql-job-juili-ee0a0a94f94e050b-driver-svc.spark-emr-operator.svc:7079 with 413.9 MiB RAM, BlockManagerId(driver, spark-sql-job-juili-ee0a0a94f94e050b-driver-svc.spark-emr-operator.svc, 7079, None)
2025-02-12T08:37:16,710 INFO [Thread-6] org.apache.spark.storage.BlockManagerMaster - Registered BlockManager BlockManagerId(driver, spark-sql-job-juili-ee0a0a94f94e050b-driver-svc.spark-emr-operator.svc, 7079, None)
2025-02-12T08:37:16,711 INFO [Thread-6] org.apache.spark.storage.BlockManager - Initialized BlockManager: BlockManagerId(driver, spark-sql-job-juili-ee0a0a94f94e050b-driver-svc.spark-emr-operator.svc, 7079, None)
2025-02-12T08:37:16,773 INFO [Thread-6] org.apache.spark.deploy.history.SingleEventLogFileWriter - Logging events to file:/tmp/spark-a0425570b1b845d1a88374eaf7ba0055.inprogress
2025-02-12T08:37:16,926 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Stopped o.s.j.s.ServletContextHandler@27f5f3c6{/,null,STOPPED,@Spark}
2025-02-12T08:37:16,928 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@5110ac2f{/jobs,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,929 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@4e3efff6{/jobs/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,931 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@8de8abe{/jobs/job,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,932 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@292a8317{/jobs/job/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,933 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@50054bd6{/stages,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,934 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@2d6552e4{/stages/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,936 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@1c6317bb{/stages/stage,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,937 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@37d5ffec{/stages/stage/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,938 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@68572698{/stages/pool,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,941 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@12dd1cc0{/stages/pool/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,943 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@46b495aa{/storage,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,944 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@4baaccbb{/storage/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,945 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@ee1b134{/storage/rdd,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,946 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@7d5c5bb1{/storage/rdd/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,947 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@41940988{/environment,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,949 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@4a70438e{/environment/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,952 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@12c815e5{/executors,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,953 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@7ffd69a8{/executors/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,956 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@5294550b{/executors/threadDump,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,956 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@3810cd70{/executors/threadDump/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,957 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@408b950c{/executors/heapHistogram,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,958 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@707f64f9{/executors/heapHistogram/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,966 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@619a561b{/static,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,967 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@2d0f1265{/,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,971 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@16837113{/api,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,973 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@3899a8b3{/jobs/job/kill,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,975 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@401acafb{/stages/stage/kill,null,AVAILABLE,@Spark}
2025-02-12T08:37:16,984 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@1c9b77ac{/metrics/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:46,213 INFO [Thread-6] org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend - SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
Showing databases from Glue Data Catalog:
2025-02-12T08:37:46,369 INFO [Thread-6] org.apache.spark.sql.internal.SharedState - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
2025-02-12T08:37:46,372 INFO [Thread-6] org.apache.spark.sql.internal.SharedState - Warehouse path is 'file:/opt/spark/spark-warehouse'.
2025-02-12T08:37:46,386 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@45e04336{/SQL,null,AVAILABLE,@Spark}
2025-02-12T08:37:46,387 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@6ef04c4b{/SQL/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:46,388 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@39eb2062{/SQL/execution,null,AVAILABLE,@Spark}
2025-02-12T08:37:46,389 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@70acafe4{/SQL/execution/json,null,AVAILABLE,@Spark}
2025-02-12T08:37:46,391 INFO [Thread-6] org.sparkproject.jetty.server.handler.ContextHandler - Started o.s.j.s.ServletContextHandler@99c284d{/static/sql,null,AVAILABLE,@Spark}
2025-02-12T08:37:48,253 INFO [Thread-6] org.apache.hadoop.hive.conf.HiveConf - Found configuration file null
2025-02-12T08:37:48,264 INFO [Thread-6] org.apache.spark.sql.hive.HiveUtils - Initializing HiveMetastoreConnection version 2.3.9 using Spark classes.
2025-02-12T08:37:48,507 INFO [Thread-6] org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 2.3.9) is file:/opt/spark/spark-warehouse
2025-02-12T08:37:48,608 WARN [Thread-6] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.jdbc.timeout does not exist
2025-02-12T08:37:48,610 WARN [Thread-6] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.glue.catalogid does not exist
2025-02-12T08:37:48,610 WARN [Thread-6] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.client.factory.class does not exist
2025-02-12T08:37:48,611 WARN [Thread-6] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.retries.wait does not exist
2025-02-12T08:37:48,611 WARN [Thread-6] org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.imetastoreclient.factory.class does not exist
2025-02-12T08:37:48,611 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2025-02-12T08:37:48,642 INFO [Thread-6] org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called
2025-02-12T08:37:48,790 INFO [Thread-6] DataNucleus.Persistence - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2025-02-12T08:37:48,791 INFO [Thread-6] DataNucleus.Persistence - Property datanucleus.cache.level2 unknown - will be ignored
2025-02-12T08:37:50,246 INFO [Thread-6] org.apache.hadoop.hive.metastore.ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2025-02-12T08:37:52,983 INFO [Thread-6] org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
2025-02-12T08:37:52,986 INFO [Thread-6] org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore
2025-02-12T08:37:53,058 WARN [Thread-6] org.apache.hadoop.hive.metastore.ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
2025-02-12T08:37:53,058 WARN [Thread-6] org.apache.hadoop.hive.metastore.ObjectStore - setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected]
2025-02-12T08:37:53,080 WARN [Thread-6] org.apache.hadoop.hive.metastore.ObjectStore - Failed to get database default, returning NoSuchObjectException
2025-02-12T08:37:53,281 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore - Added admin role in metastore
2025-02-12T08:37:53,286 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore - Added public role in metastore
2025-02-12T08:37:53,358 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore - No user is added in admin role, since config is empty
2025-02-12T08:37:53,413 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_database: default
2025-02-12T08:37:53,415 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=spark    ip=unknown-ip-addr  cmd=get_database: default   
2025-02-12T08:37:53,432 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: *
2025-02-12T08:37:53,432 INFO [Thread-6] org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=spark    ip=unknown-ip-addr  cmd=get_databases: *    
2025-02-12T08:37:53,724 INFO [Thread-6] org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 198.605119 ms
2025-02-12T08:37:53,863 INFO [Thread-6] org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 9.529964 ms
2025-02-12T08:37:55,066 INFO [Thread-6] org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 19.875985 ms
+---------+
|namespace|
+---------+
|  default|
+---------+

2025-02-12T08:37:55,099 INFO [main] org.apache.spark.SparkContext - SparkContext is stopping with exitCode 0.
2025-02-12T08:37:55,106 INFO [main] org.sparkproject.jetty.server.AbstractConnector - Stopped Spark@2fd336e4{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2025-02-12T08:37:55,110 INFO [main] org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://spark-sql-job-juili-ee0a0a94f94e050b-driver-svc.spark-emr-operator.svc:4040
2025-02-12T08:37:55,114 INFO [main] org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend - Shutting down all executors
2025-02-12T08:37:55,118 INFO [dispatcher-CoarseGrainedScheduler] org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint - Asking each executor to shut down
2025-02-12T08:37:55,124 WARN [-1601449223-pool-21-thread-2] org.apache.spark.scheduler.cluster.k8s.ExecutorPodsWatchSnapshotSource - Kubernetes client has been closed.
2025-02-12T08:37:55,254 INFO [dispatcher-event-loop-1] org.apache.spark.MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
2025-02-12T08:37:55,269 INFO [main] org.apache.spark.storage.memory.MemoryStore - MemoryStore cleared
2025-02-12T08:37:55,270 INFO [main] org.apache.spark.storage.BlockManager - BlockManager stopped
2025-02-12T08:37:55,280 INFO [main] org.apache.spark.storage.BlockManagerMaster - BlockManagerMaster stopped
2025-02-12T08:37:55,282 INFO [dispatcher-event-loop-0] org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
2025-02-12T08:37:55,304 INFO [main] org.apache.spark.SparkContext - Successfully stopped SparkContext
2025-02-12T08:37:55,309 INFO [shutdown-hook-0] org.apache.spark.util.ShutdownHookManager - Shutdown hook called
2025-02-12T08:37:55,310 INFO [shutdown-hook-0] org.apache.spark.util.ShutdownHookManager - Deleting directory /tmp/spark-6cd1ef48-3d60-4c0c-93a6-a942b852d542
2025-02-12T08:37:55,315 INFO [shutdown-hook-0] org.apache.spark.util.ShutdownHookManager - Deleting directory /var/data/spark-a1ec6bec-ba09-440d-8e92-a13799ad0252/spark-530be58b-6854-4519-ae5f-6aef84054b60
2025-02-12T08:37:55,320 INFO [shutdown-hook-0] org.apache.spark.util.ShutdownHookManager - Deleting directory /var/data/spark-a1ec6bec-ba09-440d-8e92-a13799ad0252/spark-530be58b-6854-4519-ae5f-6aef84054b60/pyspark-0de5a31b-0a40-4a17-b572-45fa0a4bb9f0
2025-02-12T08:37:55,329 INFO [shutdown-hook-0] org.apache.hadoop.metrics2.impl.MetricsSystemImpl - Stopping s3a-file-system metrics system...
2025-02-12T08:37:55,329 INFO [shutdown-hook-0] org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system stopped.
2025-02-12T08:37:55,329 INFO [shutdown-hook-0] org.apache.hadoop.metrics2.impl.MetricsSystemImpl - s3a-file-system metrics system shutdown complete.

Steps I Have Followed So Far:

  1. Deployed Spark 3.5.3 on EKS using the Spark Operator: Configured a ServiceAccount in EKS with the necessary IAM role and policies.
  2. Built and attached necessary JARs (aws-glue-datacatalog-client-common-3.4.0-SNAPSHOT.jar, aws-glue-datacatalog-hive3-client-3.4.0-SNAPSHOT.jar, aws-glue-datacatalog-spark-client-3.4.0-SNAPSHOT.jar) following the AWS repo: AWS Glue Data Catalog Client for Apache Hive Metastore.
  3. Configuration used:
spark.sql.catalogImplementation=hive
spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
spark.hive.metastore.glue.catalogid="***********"
spark.hadoop.aws.region", "us-east-1"
  1. Executed SHOW DATABASES Query: Output only lists the default database instead of all databases in Glue.

Key Observations:

  1. Warnings indicate that hive.metastore.client.factory.class is missing, even though it’s explicitly set.
  2. Glue databases are not being fetched, just the default database.

Issue I’m Facing: Despite following these steps, I am unable to list databases from Glue using Spark. It seems like Spark is not able to recognize the Glue Catalog, and I’m not sure if it’s an issue with the dependencies, IAM permissions, or Spark configurations.

I'd welcome some input on why glue databases aren't displayed; perhaps there is a missing setup. Or are the jars I built incompatible with Spark 3.5.3? Thanks in advance for your help! 😊

Upvotes: 0

Views: 41

Answers (0)

Related Questions