Prabodh Mhalgi
Prabodh Mhalgi

Reputation: 897

How to access existing Glue catalog from EMR?

I have created a Glue catalog in my account. It has 1 DB and 1 table. Screenshot of Glue catalog from AWS console

I followed this guide from AWS and created my EMR cluster. However, when I run spark-shell and try to access Glue catalog, I am not able to see the database from Glue catalog being accessed in my EMR. Screenshot of terminal showing spark-shell

What am I missing?

Upvotes: 0

Views: 2216

Answers (2)

Prabodh Mhalgi
Prabodh Mhalgi

Reputation: 897

This was a non issue. I was trying to launch and EMR in US-East-1, and for some reason, the EMR was not getting provisioned even if the underlying EC2's were provisioned and in running state. I was able to ssh to the EC2s and run spark-shell on them too.

I launched an EMR in US-East-2 and it was completely provisioned. I was able to connect to the Glue catalog successfully.

Upvotes: 0

Sajjan Bhattarai
Sajjan Bhattarai

Reputation: 111

It doesn't look like Spark is using the Glue DataCatalog in your cluster. Did you enable the Glue catalog option for Spark when creating the cluster? For existing cluster, you can check the cluster Configuration in Console. It should have something like this:

[
  {
    "Classification": "spark-hive-site",
    "Properties": {
      "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
    }
  }
]

If your cluster has above config set, and Spark is still unable to fetch info from Glue catalog, you may want to enable DEBUG level logging in Spark for more details.

Upvotes: 1

Related Questions