Multiple values in EMR Cluster Configuration template

Question

Within my EMR module I have a template that is deployed for the cluster configuration, within this template are all the cluster configuration requirements for the given classification type as specified in the variable emr_cluster_applications e.g. Spark, Hadoop, Hive.

Visual:

emr_cluster_applications = ["Spark", "Hadoop", "Hive"]
emr_cluster_configurations = file("./filepath/to/template.json")

This set up works fine however moving forward I'm wondering if the template can be populated based on the values within the emr_cluster_applications variable.

For example in a seperate deployment, if ["Spark", "Hadoop"] were specified as opposed to all three, then the template file would only use the corresponding Spark and Hadoop configuration with Hive being ignored although still present in the file - is this possible?

Update: Template file:

[
  {
    "Classification": "spark",
    "Properties":{
       "maximizeResourceAllocation": "false",
       "spark.executor.memoryOverhead": "4G"
     }
  },
  {
    "Classification": "hive",
    "Properties":{
      "javax.jdo.option.ConnectionURL": XXXX
      "javax.jdo.option.ConnectionDriverName": XXXX
      "javax.jdo.option.ConnectionUserName": XXXX
      "javax.jdo.option.ConnectionPassword": XXXX
     }
  },
  {
     "Classification": "hbase-site",
     "Properties": {
        "hbase.rootdir": "XXXXXXXXXX"
      }
  },
  {
     "Classification": "hbase",
     "Properties":{
        "hbase.emr.storageMode": "s3"
        "hbase.emr.readreplica.emnabled": "true"
      }
   }
]

Multiple values in EMR Cluster Configuration template

Answers (1)

Related Questions