Reputation: 75
How to create alarm when 1) Alert when EC2 instance runs for too long (Say for 1 hour). 2)Alert when number of EC2 instances reaches a threshold (say 5 instances at a time)
One more assumption is, these EC2 instance are specific.Say these alerts applicable to EC2 instances where their instance name start with "test".
When i try to create the alarm , i haven't see this logic in Metrics. Standard Metrics include CPU Utilization, Network In, Network Out etc.
Is there a way to create this alarm either by defining our custom metrics or some other options?
Upvotes: 3
Views: 1553
Reputation: 190
I recently implemented a solution (see Github repo) to create alarms for EC2 instances based on runtime, but the approach can also be adapted for instance count. Here's how I approached it:
Here's a simplified version of the Lambda function:
import boto3
from datetime import datetime, timezone
def lambda_handler(event, context):
ec2 = boto3.client('ec2')
# Specify the desired runtime threshold in hours
runtime_threshold = 1
# Specify the desired instance count threshold
instance_count_threshold = 5
# Get all running EC2 instances
instances = ec2.describe_instances(Filters=[
{'Name': 'instance-state-name', 'Values': ['running']},
{'Name': 'tag:Name', 'Values': ['test*']}
])
instance_count = 0
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
instance_count += 1
# Calculate runtime
launch_time = instance['LaunchTime']
current_time = datetime.now(timezone.utc)
runtime = current_time - launch_time
runtime_hours = runtime.total_seconds() / 3600
if runtime_hours > runtime_threshold:
# Send runtime alert
send_alert(f"Instance {instance['InstanceId']} has been running for {runtime_hours:.2f} hours.")
if instance_count > instance_count_threshold:
# Send instance count alert
send_alert(f"There are currently {instance_count} running instances.")
def send_alert(message):
# Implement your alert mechanism here (e.g., SNS, email)
print(message)
This Lambda function retrieves all running EC2 instances with names starting with "test", calculates their runtime, and sends alerts if the runtime exceeds the specified threshold. It also sends an alert if the total count of relevant instances exceeds the specified threshold.
Note: Make sure to replace the send_alert
function with your desired alert mechanism (e.g., SNS, email).
I hope this helps!
Upvotes: -1
Reputation: 5103
You can use Custom Metric to publish the events in CloudWatch and then you can use that event to set an alarm.
Upvotes: 0
Reputation: 543
For automatically deployed instances it’s impossible to setup CloudWatch Alarm as you do not know the instance ID. The only way to setup an alarm was to create an AWS Lambda function that poles all the running instances and compares their launch time to a specified timeout.
The lambda function is periodically triggered by a CloudWatch - Event – Rule.
Use tags to specify different run durations to different machines. For example your launch tool should tag the instance with key value “Test”
Please note this code comes with NO warranties at all! This is more of an example.
import boto3
import datetime
import json
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
ec2_client = boto3.client('ec2')
INSTANCE_TIMEOUT = 24
MAX_PERMITTED_INSTANCES = 5
MAILING_LIST = "[email protected], [email protected]"
def parse_tag(tags, keyValuePair):
for tag in tags:
if tag['Key'] == keyValuePair[0] and tag['Value'] == keyValuePair[1]:
return True
return False
def runtimeExceeded(instance, timeOutHours):
# Working in to UTC to avoid time-travel during daylight-saving changeover
timeNow = datetime.datetime.utcnow()
instanceRuntime = timeNow - instance.launch_time.replace(tzinfo=None)
print instanceRuntime
if instanceRuntime > datetime.timedelta(hours=timeOutHours):
return True
else:
return False
def sendAlert(instance, message):
msg = MIMEMultipart()
msg['From'] = '[email protected]'
msg['To'] = MAILING_LIST
msg['Subject'] = "AWS Alert: " + message
bodyText = '\n\nThis message was sent by the AWS Monitor ' + \
'Lambda. For details see AwsConsole-Lambdas. \n\nIf you want to ' + \
'exclude an instance from this monitor, tag it ' + \
'with Key=RuntimeMonitor Value=False'
messageBody = MIMEText( message + '\nInstance ID: ' +
str(instance.instance_id) + '\nIn Availability zone: '
+ str(instance.placement['AvailabilityZone']) + bodyText)
msg.attach(messageBody)
ses = boto3.client('ses')
ses.send_raw_email(RawMessage={'Data' : msg.as_string()})
def lambda_handler(event, context):
aws_regions = ec2_client.describe_regions()['Regions']
for region in aws_regions:
runningInstancesCount = 0
try:
ec2 = boto3.client('ec2', region_name=region['RegionName'])
ec2_resource = boto3.resource('ec2',
region_name=region['RegionName'])
aws_region = region['RegionName']
instances = ec2_resource.instances.all()
for i in instances:
if i.state['Name'] == 'running':
runningInstancesCount +=1
if i.tags != None:
if parse_tag(i.tags, ('RuntimeMonitor', 'False')):
# Ignore these instances
pass
else:
if runtimeExceeded(i, INSTANCE_TIMEOUT):
sendAlert(i, "An EC2 instance has been running " + \
"for over {0} hours".format(INSTANCE_TIMEOUT))
else:
print "Untagged instence"
if runtimeExceeded(i, UNKNOWN_INSTANCE_TIMEOUT):
sendAlert(i, "An EC2 instance has been running " + \
"for over {0} hours".format(UNKNOWN_INSTANCE_TIMEOUT))
except Exception as e:
print e
continue
if runningInstancesCount > MAX_PERMITTED_INSTANCES:
sendAlert(i, "Number of running instances exceeded threshold " + \
"{0} running instances".format(runningInstancesCount))
return True
Upvotes: 2