Sonu
Sonu

Reputation: 39

AWSCLI Commands using Python

I want to fetch the ClusterId,ClusterArn,Public DNS of an active EMR Cluster and load them in a Postgres Table.I am able to get the ClusterId & Arn using CLI Commands in the console.

aws emr list-clusters --active --query "Clusters[*].{ClusterId:Id}" --output text
aws emr list-clusters --active --query "Clusters[*].{ClusterArn:ClusterArn}" --output text

After getting the cluster_id , I am able to fetch the DNS using the CLI Command.

cluster_id=j-xxx
aws emr describe-cluster --output text --cluster-id $cluster_id --query Cluster.MasterPublicDnsName

But I have to do this in a Python script. I am not able to integrate this commands in a python script. So for my purpose I did the following - Ran the below command and re-directed the output to a json file.

aws emr list-clusters --active > test.json

Contents of the test.json File -

{
    "Clusters": [
        {
            "Id": "j-xxx",
            "Name": "xxx",
            "Status": {
                "State": "WAITING",
                "StateChangeReason": {
                    "Message": "Cluster ready after last step completed."
                },
                "Timeline": {
                    "CreationDateTime": "2021-12-01T01:08:10.755000-06:00",
                    "ReadyDateTime": "2021-12-01T01:20:13.483000-06:00"
                }
            },
            "NormalizedInstanceHours": 832,
            "ClusterArn": "arn:aws:elasticmapreduce:xxx:xxx:cluster/j-xxx"
        }
    ]
}

Now reading that json file using Python -

import json
import psycopg2

with open("cluster_info.json") as file:
    data=json.load(file)
CId=data["Clusters"][0]["Id"]
CArn=data["Clusters"][0]["ClusterArn"]
print(CId)
print(CArn)
#CDNS=`aws emr describe-cluster --output text --cluster-id $CId --query Cluster.MasterPublicDnsName`
#print(CDNS)
conn = psycopg2.connect(
   database="postgres", user='xxx', password='xxx', host='xxxx.rds.amazonaws.com', port= '5432'
)
cursor = conn.cursor()
query = '''INSERT INTO STAGE.EMR_CLUSTER_INFO (Cluster_ID, Cluster_Arn, Public_DNS) VALUES (%s,%s,%s)'''
values = (CId, CArn, 'ip-xxxx.ec2.internal') #Since I wasnt able to fetch DNS,so hardcoded the value just to test if the record is getting inserted in the tale or not
cursor.execute(query,values)
conn.commit()
print("Records inserted........")
conn.close()

I was able to insert the record in the Table. But I need to fetch the ClusterID,Arn,DNS in the same script and then load the values in the table. Tried using Boto3 ... couldnot succeed ...Please help. Thanks in Advance.

Upvotes: 0

Views: 967

Answers (2)

John Rotenstein
John Rotenstein

Reputation: 269826

From boto3 describe_cluster():

import boto3

emr_client = boto3.client('emr')

clusters = emr_client.list_clusters()

for cluster in clusters['Clusters']:

  cluster_id = cluster['Id']

  response = emr_client.describe_cluster(ClusterId=cluster_id)

  cluster_arn = response['Cluster']['ClusterArn']
  cluster_dns_name = response['Cluster']['MasterPublicDnsName']

  # Insert into database here

Upvotes: 1

Prakitidev Verma
Prakitidev Verma

Reputation: 76

is this what you are looking for ? How do I list all running EMR clusters using Boto? and https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#client

from boto you can fetch detials of MasterPublicDnsName, Id and clusterARN. If you need to know more let me know.

Upvotes: 0

Related Questions