danieln
danieln

Reputation: 4973

Create EMR Hive cluster with glue catalog using CLI

I would like to create EMR Hive cluster, which will use Glue as data catalog, using AWS CLI.
I didn't find anything related to that in AWS docs or searching in other places.
Is this possible?

Upvotes: 1

Views: 711

Answers (1)

aksyuma
aksyuma

Reputation: 3180

First we create a configuration classification named emr.json that specifies AWS Glue Data Catalog as the metastore for Hive:

[
  {
    "Classification": "hive-site",
    "Properties": {
      "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
      "hive.metastore.schema.verification": "false"
    }
  }
]

Note: On EMR release version 5.28.0, 5.28.1, or 5.29.0, if you're creating a cluster using the AWS Glue Data Catalog as the metastore, we set the hive.metastore.schema.verification to false.

Finally, we combine the configuration classification file with our final command as follows :

aws emr create-cluster --name "syumaK-cluster" --configurations file://emr.json --release-label emr-5.28.0 --use-default-roles --applications Name=Hadoop Name=Spark Name=Hive Name=HUE --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.medium

Response:

{
    "ClusterId": "j-2NZ6xxxxxx", 
    "ClusterArn": "arn:aws:elasticmapreduce:us-east-1:1925xxxxx:cluster/j-2NZ6xxxxxx"
}

Hope this helps!

Upvotes: 2

Related Questions