Reputation: 35
Within my EMR module I have a template that is deployed for the cluster configuration, within this template are all the cluster configuration requirements for the given classification type as specified in the variable emr_cluster_applications e.g. Spark, Hadoop, Hive.
Visual:
emr_cluster_applications = ["Spark", "Hadoop", "Hive"]
emr_cluster_configurations = file("./filepath/to/template.json")
This set up works fine however moving forward I'm wondering if the template can be populated based on the values within the emr_cluster_applications variable.
For example in a seperate deployment, if ["Spark", "Hadoop"] were specified as opposed to all three, then the template file would only use the corresponding Spark and Hadoop configuration with Hive being ignored although still present in the file - is this possible?
Update: Template file:
[
{
"Classification": "spark",
"Properties":{
"maximizeResourceAllocation": "false",
"spark.executor.memoryOverhead": "4G"
}
},
{
"Classification": "hive",
"Properties":{
"javax.jdo.option.ConnectionURL": XXXX
"javax.jdo.option.ConnectionDriverName": XXXX
"javax.jdo.option.ConnectionUserName": XXXX
"javax.jdo.option.ConnectionPassword": XXXX
}
},
{
"Classification": "hbase-site",
"Properties": {
"hbase.rootdir": "XXXXXXXXXX"
}
},
{
"Classification": "hbase",
"Properties":{
"hbase.emr.storageMode": "s3"
"hbase.emr.readreplica.emnabled": "true"
}
}
]
Upvotes: 1
Views: 651
Reputation: 18203
This is the best I could come up with and there might be better solutions, so take this with a grain of salt. I have problems with mapping the Hadoop to two different elements from the JSON, so I had to do some modifications to the variables in order to make it work. I strongly suggest doing any variable manipulation within a locals
block in order to avoid clutter in the resources. The locals.tf
example:
locals {
emr_template = [
{
"Classification" : "spark",
"Properties" : {
"maximizeResourceAllocation" : "false",
"spark.executor.memoryOverhead" : "4G"
}
},
{
"Classification" : "hive",
"Properties" : {
"javax.jdo.option.ConnectionURL" : "XXXX",
"javax.jdo.option.ConnectionDriverName" : "XXXX",
"javax.jdo.option.ConnectionUserName" : "XXXX",
"javax.jdo.option.ConnectionPassword" : "XXXX"
}
},
{
"Classification" : "hbase-site",
"Properties" : {
"hbase.rootdir" : "XXXXXXXXXX"
}
},
{
"Classification" : "hbase",
"Properties" : {
"hbase.emr.storageMode" : "s3",
"hbase.emr.readreplica.emnabled" : "true"
}
}
]
emr_template_mapping = { for template in local.emr_template : template.Classification => template }
hadoop_enabled = false
hadoop = local.hadoop_enabled ? ["hbase", "hbase-site"] : []
apps_enabled = ["spark", "hive"]
emr_cluster_applications = concat(local.apps_enabled, local.hadoop)
}
You can manipulate which apps will be added with two options:
hbase
and hbase-site
need to be added to the list of the allowed apps. If it is not enabled, then the value of the hadoop
variable will be an empty list.apps_enabled
local variable you decide which ones you want to enable, i.e., spark
, hive
, none, or both.Finally, in the emr_cluster_applications
local variable you would use concat
to concatenate the two lists into one.
Then, to create a JSON file locally, you could use the local_file
option:
resource "local_file" "emr_template_file" {
content = jsonencode([for app in local.emr_cluster_applications :
local.emr_template_mapping["${app}"] if contains(keys(local.emr_template_mapping), "${app}")
]
)
filename = "${path.root}/template.json"
}
The local_file
will output a JSON encoded file which can be used where you need it. I am pretty sure there are better ways to do it, so maybe someone else will see this and give a better answer.
Upvotes: 2