Reputation: 39
I want to fetch the ClusterId,ClusterArn,Public DNS of an active EMR Cluster and load them in a Postgres Table.I am able to get the ClusterId & Arn using CLI Commands in the console.
aws emr list-clusters --active --query "Clusters[*].{ClusterId:Id}" --output text
aws emr list-clusters --active --query "Clusters[*].{ClusterArn:ClusterArn}" --output text
After getting the cluster_id , I am able to fetch the DNS using the CLI Command.
cluster_id=j-xxx
aws emr describe-cluster --output text --cluster-id $cluster_id --query Cluster.MasterPublicDnsName
But I have to do this in a Python script. I am not able to integrate this commands in a python script. So for my purpose I did the following - Ran the below command and re-directed the output to a json file.
aws emr list-clusters --active > test.json
Contents of the test.json File -
{
"Clusters": [
{
"Id": "j-xxx",
"Name": "xxx",
"Status": {
"State": "WAITING",
"StateChangeReason": {
"Message": "Cluster ready after last step completed."
},
"Timeline": {
"CreationDateTime": "2021-12-01T01:08:10.755000-06:00",
"ReadyDateTime": "2021-12-01T01:20:13.483000-06:00"
}
},
"NormalizedInstanceHours": 832,
"ClusterArn": "arn:aws:elasticmapreduce:xxx:xxx:cluster/j-xxx"
}
]
}
Now reading that json file using Python -
import json
import psycopg2
with open("cluster_info.json") as file:
data=json.load(file)
CId=data["Clusters"][0]["Id"]
CArn=data["Clusters"][0]["ClusterArn"]
print(CId)
print(CArn)
#CDNS=`aws emr describe-cluster --output text --cluster-id $CId --query Cluster.MasterPublicDnsName`
#print(CDNS)
conn = psycopg2.connect(
database="postgres", user='xxx', password='xxx', host='xxxx.rds.amazonaws.com', port= '5432'
)
cursor = conn.cursor()
query = '''INSERT INTO STAGE.EMR_CLUSTER_INFO (Cluster_ID, Cluster_Arn, Public_DNS) VALUES (%s,%s,%s)'''
values = (CId, CArn, 'ip-xxxx.ec2.internal') #Since I wasnt able to fetch DNS,so hardcoded the value just to test if the record is getting inserted in the tale or not
cursor.execute(query,values)
conn.commit()
print("Records inserted........")
conn.close()
I was able to insert the record in the Table. But I need to fetch the ClusterID,Arn,DNS in the same script and then load the values in the table. Tried using Boto3 ... couldnot succeed ...Please help. Thanks in Advance.
Upvotes: 0
Views: 967
Reputation: 269826
From boto3 describe_cluster()
:
import boto3
emr_client = boto3.client('emr')
clusters = emr_client.list_clusters()
for cluster in clusters['Clusters']:
cluster_id = cluster['Id']
response = emr_client.describe_cluster(ClusterId=cluster_id)
cluster_arn = response['Cluster']['ClusterArn']
cluster_dns_name = response['Cluster']['MasterPublicDnsName']
# Insert into database here
Upvotes: 1
Reputation: 76
is this what you are looking for ? How do I list all running EMR clusters using Boto? and https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#client
from boto you can fetch detials of MasterPublicDnsName, Id and clusterARN. If you need to know more let me know.
Upvotes: 0