Reputation: 6393
I'm doing a lot of work with AWS EMR and when you build an EMR cluster through the AWS Management Console you can click a button to export the AWS CLI Command that creates the EMR cluster.
It then gives you a big CLI command that isn't formatted in any way i.e., if you copy and paste the command it's all on a single line.
I'm using these EMR CLI commands, that were created by other individuals, to create the EMR clusters in Python using the AWS SDK Boto3 library i.e., I'm looking at the CLI command to get all the configuration details. Some of the configuration details are present on the AWS Management Console UI but not all of them so it's easier for me to use the CLI command that you can export.
However, the AWS CLI command is very hard to read since it's not formatted. Is there an AWS CLI command formatter available online similar to JSON formatters?
Another solution I could use is to Clone the EMR Cluster and go through the EMR Cluster creation screen on the AWS Management Console to get all the configuration details but I'm still curious if I could format the CLI Command and do it that way. Another added benefit of being able to format the exported CLI command is that I could put it on a Confluence page for documentation.
Upvotes: 1
Views: 749
Reputation: 8523
Here is some quick python code to do it:
import shlex
import json
import re
def format_command(command):
tokens = shlex.split(command)
formatted = ''
for token in tokens:
# Flags get a new line
if token.startswith("--"):
formatted += '\\\n '
# JSON data
if token[0] in ('[', '{'):
json_data = json.loads(token)
data = json.dumps(json_data, indent=4).replace('\n', '\n ')
formatted += "'{}' ".format(data)
# Quote token when it contains whitespace
elif re.match('\s', token):
formatted += "'{}' ".format(token)
# Simple print for remaining tokens
else:
formatted += token + ' '
return formatted
example = """aws emr create-cluster --applications Name=spark Name=ganglia Name=hadoop --tags 'Project=MyProj' --ec2-attributes '{"KeyName":"emr-key","AdditionalSlaveSecurityGroups":["sg-3822994c","sg-ccc76987"],"InstanceProfile":"EMR_EC2_DefaultRole","ServiceAccessSecurityGroup":"sg-60832c2b","SubnetId":"subnet-3c76ee33","EmrManagedSlaveSecurityGroup":"sg-dd832c96","EmrManagedMasterSecurityGroup":"sg-b4923dff","AdditionalMasterSecurityGroups":["sg-3822994c","sg-ccc76987"]}' --service-role EMR_DefaultRole --release-label emr-5.14.0 --name 'Test Cluster' --instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m4.xlarge","Name":"Master"},{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m4.xlarge","Name":"CORE"}]' --configurations '[{"Classification":"spark-defaults","Properties":{"spark.sql.avro.compression.codec":"snappy","spark.eventLog.enabled":"true","spark.dynamicAllocation.enabled":"false"},"Configurations":[]},{"Classification":"spark-env","Properties":{},"Configurations":[{"Classification":"export","Properties":{"SPARK_DAEMON_MEMORY":"4g"},"Configurations":[]}]}]' --scale-down-behavior TERMINATE_AT_TASK_COMPLETION --region us-east-1"""
print(format_command(example))
Output looks like this:
aws emr create-cluster \
--applications Name=spark Name=ganglia Name=hadoop \
--tags Project=MyProj \
--ec2-attributes '{
"ServiceAccessSecurityGroup": "sg-60832c2b",
"InstanceProfile": "EMR_EC2_DefaultRole",
"EmrManagedMasterSecurityGroup": "sg-b4923dff",
"KeyName": "emr-key",
"SubnetId": "subnet-3c76ee33",
"AdditionalMasterSecurityGroups": [
"sg-3822994c",
"sg-ccc76987"
],
"AdditionalSlaveSecurityGroups": [
"sg-3822994c",
"sg-ccc76987"
],
"EmrManagedSlaveSecurityGroup": "sg-dd832c96"
}' \
--service-role EMR_DefaultRole \
--release-label emr-5.14.0 \
--name Test Cluster \
--instance-groups '[
{
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "gp2",
"SizeInGB": 32
},
"VolumesPerInstance": 1
}
]
},
"InstanceCount": 1,
"Name": "Master",
"InstanceType": "m4.xlarge",
"InstanceGroupType": "MASTER"
},
{
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "gp2",
"SizeInGB": 32
},
"VolumesPerInstance": 1
}
]
},
"InstanceCount": 1,
"Name": "CORE",
"InstanceType": "m4.xlarge",
"InstanceGroupType": "CORE"
}
]' \
--configurations '[
{
"Properties": {
"spark.eventLog.enabled": "true",
"spark.dynamicAllocation.enabled": "false",
"spark.sql.avro.compression.codec": "snappy"
},
"Classification": "spark-defaults",
"Configurations": []
},
{
"Properties": {},
"Classification": "spark-env",
"Configurations": [
{
"Properties": {
"SPARK_DAEMON_MEMORY": "4g"
},
"Classification": "export",
"Configurations": []
}
]
}
]' \
--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
--region us-east-1
Upvotes: 3