rm916463
rm916463

Reputation: 35

Multiple values in EMR Cluster Configuration template

Within my EMR module I have a template that is deployed for the cluster configuration, within this template are all the cluster configuration requirements for the given classification type as specified in the variable emr_cluster_applications e.g. Spark, Hadoop, Hive.

Visual:

emr_cluster_applications = ["Spark", "Hadoop", "Hive"]
emr_cluster_configurations = file("./filepath/to/template.json")

This set up works fine however moving forward I'm wondering if the template can be populated based on the values within the emr_cluster_applications variable.

For example in a seperate deployment, if ["Spark", "Hadoop"] were specified as opposed to all three, then the template file would only use the corresponding Spark and Hadoop configuration with Hive being ignored although still present in the file - is this possible?

Update: Template file:

[
  {
    "Classification": "spark",
    "Properties":{
       "maximizeResourceAllocation": "false",
       "spark.executor.memoryOverhead": "4G"
     }
  },
  {
    "Classification": "hive",
    "Properties":{
      "javax.jdo.option.ConnectionURL": XXXX
      "javax.jdo.option.ConnectionDriverName": XXXX
      "javax.jdo.option.ConnectionUserName": XXXX
      "javax.jdo.option.ConnectionPassword": XXXX
     }
  },
  {
     "Classification": "hbase-site",
     "Properties": {
        "hbase.rootdir": "XXXXXXXXXX"
      }
  },
  {
     "Classification": "hbase",
     "Properties":{
        "hbase.emr.storageMode": "s3"
        "hbase.emr.readreplica.emnabled": "true"
      }
   }
]

Upvotes: 1

Views: 651

Answers (1)

Marko E
Marko E

Reputation: 18203

This is the best I could come up with and there might be better solutions, so take this with a grain of salt. I have problems with mapping the Hadoop to two different elements from the JSON, so I had to do some modifications to the variables in order to make it work. I strongly suggest doing any variable manipulation within a locals block in order to avoid clutter in the resources. The locals.tf example:

locals {

  emr_template = [
    {
      "Classification" : "spark",
      "Properties" : {
        "maximizeResourceAllocation" : "false",
        "spark.executor.memoryOverhead" : "4G"
      }
    },
    {
      "Classification" : "hive",
      "Properties" : {
        "javax.jdo.option.ConnectionURL" : "XXXX",
        "javax.jdo.option.ConnectionDriverName" : "XXXX",
        "javax.jdo.option.ConnectionUserName" : "XXXX",
        "javax.jdo.option.ConnectionPassword" : "XXXX"
      }
    },
    {
      "Classification" : "hbase-site",
      "Properties" : {
        "hbase.rootdir" : "XXXXXXXXXX"
      }
    },
    {
      "Classification" : "hbase",
      "Properties" : {
        "hbase.emr.storageMode" : "s3",
        "hbase.emr.readreplica.emnabled" : "true"
      }
    }
  ]

  emr_template_mapping = { for template in local.emr_template : template.Classification => template }
  
  hadoop_enabled           = false
  hadoop                   = local.hadoop_enabled ? ["hbase", "hbase-site"] : []
  apps_enabled             = ["spark", "hive"]
  emr_cluster_applications = concat(local.apps_enabled, local.hadoop)

}

You can manipulate which apps will be added with two options:

  1. If the Hadoop is enabled, that means hbase and hbase-site need to be added to the list of the allowed apps. If it is not enabled, then the value of the hadoop variable will be an empty list.
  2. In the apps_enabled local variable you decide which ones you want to enable, i.e., spark, hive, none, or both.

Finally, in the emr_cluster_applications local variable you would use concat to concatenate the two lists into one.

Then, to create a JSON file locally, you could use the local_file option:

resource "local_file" "emr_template_file" {
  content = jsonencode([for app in local.emr_cluster_applications :
    local.emr_template_mapping["${app}"] if contains(keys(local.emr_template_mapping), "${app}")
    ]
  )
  filename = "${path.root}/template.json"
}

The local_file will output a JSON encoded file which can be used where you need it. I am pretty sure there are better ways to do it, so maybe someone else will see this and give a better answer.

Upvotes: 2

Related Questions