Reputation: 3456
My use case is: I work with sensitive data in an AWS EMR Notebook (Python/PySpark/Spark kernels) attached to an AWS EMR cluster in an enterprise environment (hence limited permissions). Sometimes, I queue up a number of cells that will take a variable amount of time to complete. Rather than constantly monitor my notebook waiting for the cells to finish before stopping my notebook and terminating my cluster, I'd like to be able to queue up a cell that terminates the AWS EMR cluster.
To that end, based on reading other SO posts/documentation, I need my cluster-id
. It also seems that to find that, I need an instance ID.
The following code works for me:
!wget -q -O - http://169.254.169.254/latest/meta-data/instance-id
But I could not get these following lines to work, due to the command not being found or permission denied issues.
!aws emr list-clusters --active --query "Clusters[*].{Name:Name}" --output text
You must specify a region. You can also configure your region by running "aws configure".
I am looking for a simple command or set of commands that I can put in one cell and run to terminate the cluster my notebook is attached to. My workflow is: at the beginning of each workday, create a new cluster with a new name and at the end of the workday terminate that cluster. Given this, I'd like the command to not have to change every time my cluster name changes. I'd also like to avoid having to run aws configure
if possible.
References:
AWS CLI EMR get Master node Instance ID and tag it
List all "Active" EMR cluster using Boto3
Get AWS EMR Cluster ID from Name
Upvotes: 1
Views: 634
Reputation: 911
This works on emr-5.30.1 in an EMR Notebook:
import os
emr_cluster_id = os.environ.get('EMR_CLUSTER_ID')
emr_step_id = os.environ.get('EMR_STEP_ID')
To be safe, if you want to raise an Exception if EMR_CLUSTER_ID
was not defined then you can use this:
emr_cluster_id = os.environ['EMR_CLUSTER_ID']
See the Spark History server Environment tab for other useful variables.
Then to terminate your cluster the below should work (untested).
sc.install_pypi_package("boto3")
import boto3
client = boto3.client('emr')
response = client.terminate_job_flows(
JobFlowIds=[emr_cluster_id]
)
Upvotes: 0