Problem with AWS autoscaling lifecycle hook integrated with lambda (python) and jenkins

Question

Hope some one can help me to crack this one. I'm not advance python user and started with AWS services like AWS ASG Lifecycle hooks and AWS EventBridge. I'm trying to achieve the following.

Desired state of the ASG group change and new instance is being launched
Autoscaling Event will be feed to AWS Evenbus (I use default bus) with rule to look for events coming from "aws.autoscaling" and type of "EC2 Instance-launch Lifecycle Action"
Event payload is forwarded to lambda function as target
Python lamda function performs the following:
- discover instance private IP
- make remote API call to Jenkins with two parameters, instance ID and IP
- notify asg's lifecycle-hook to 'CONTINUE' if jenkins call succedded, 'ABANDON' otherwise

Problem is when 'Desired state' of the asg group is increased by more than 1 unit. Lambda function is invoked multiple times for each provisioned instance which generated multiple API jenkins calls. It could be something within lambda function missing/misconfigured or eventbus rule pattern needs to be different (I cannot see any other options really). What I need is one Jenkins remote API invoked per instance or other option, if possible, to invoke one API call per one 'Desired state' change (accumulating all instances created in one call).

See below setup info:

LifecycleHook

{
    "LifecycleHooks": [
        {
            "LifecycleHookName": "LogAutoScalingEventHook",
            "AutoScalingGroupName": "prod-use1-test",
            "LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
            "HeartbeatTimeout": 300,
            "GlobalTimeout": 30000,
            "DefaultResult": "ABANDON"
        }
    ]
}

Events rule pattern

{
    "Name": "LogAutoScalingEventRule",
    "Arn": "arn:aws:events:us-east-1:xxxxxxxxxxxx:rule/LogAutoScalingEventRule",
    "EventPattern": "{\"source\":[\"aws.autoscaling\"],\"detail-type\":[\"EC2 Instance-launch Lifecycle Action\"]}",
    "State": "ENABLED",
    "EventBusName": "default",
    "CreatedBy": "xxxxxxxxxxxx"
}

Events rule target

{
    "Targets": [
        {
            "Id": "Id61f23d81-3a04-403b-bb73-9ea5f8b8e4d8",
            "Arn": "arn:aws:lambda:us-east-1:xxxxxxxxxxxx:function:jenkins-function"
        }
    ]
}

lambda infrastructure

{
    "Configuration": {
        "FunctionName": "jenkins-function",
        "FunctionArn": "arn:aws:lambda:us-east-1:xxxxxxxxxxxx:function:jenkins-function",
        "Runtime": "python3.9",
        "Role": "arn:aws:iam::xxxxxxxxxxxx:role/service-role/HOME_LogAutoScalingEventRole",
        "Handler": "index.jenkins_handler",
        "CodeSize": 960531,
        "Description": "",
        "Timeout": 3,
        "MemorySize": 128,
        "LastModified": "2023-04-05T15:56:50.713+0000",
        "CodeSha256": "8TNg+VrvSFiDxFCBohUNsBmuertrpqjyqQCqOdOf1ss=",
        "Version": "$LATEST",
        "VpcConfig": {
            "SubnetIds": [
                "subnet-xxxxxxxxxxxxxx"
            ],
            "SecurityGroupIds": [
                "sg-xxxxxxxxxxxxxx"
            ],
            "VpcId": "vpc-xxxxxxxxxxxxxx"
        },
        "Environment": {
            "Variables": {
                "API_TOKEN": "jenkins_token",
                "JENKINS_PORT": "jenkins_port",
                "USERNAME": "jenkins_user",
                "JENKINS_URL": "jenkins_url"
            }
        },
        "TracingConfig": {
            "Mode": "PassThrough"
        },
        "RevisionId": "4118e398-9ef2-41f5-8d67-fd9bbf256cae",
        "State": "Active",
        "LastUpdateStatus": "Successful",
        "PackageType": "Zip",
        "Architectures": [
            "x86_64"
        ],
        "EphemeralStorage": {
            "Size": 512
        }
    },
    "Code": {
        "RepositoryType": "S3",
        "Location": "code_location"
    }
}

lambda function

import logging
import json
import os
import boto3
import requests

logger = logging.getLogger("asg-instance-launch")
logger.setLevel(logging.DEBUG)

RUNTIME_REGION = os.environ['AWS_REGION']
USERNAME = os.environ['USERNAME']
JENKINS_URL = os.environ['JENKINS_URL']
API_TOKEN = os.environ['API_TOKEN']
JENKINS_PORT = os.environ['JENKINS_PORT']

LIFECYCLE_KEY = "LifecycleHookName"
ASG_KEY = "AutoScalingGroupName"
EC2_KEY = "EC2InstanceId"
LIFECYCLE_TOKEN = "LifecycleActionToken"


def jenkins_handler(event, context):
    logger.debug(json.dumps(event, indent=2))

    message = event['detail']
    if LIFECYCLE_KEY in message and ASG_KEY in message:
        logger.debug("Jenkins API call")
        instance_id = message[EC2_KEY]
        instance_ip = discover_instance_ip(instance_id)
        life_cycle_hook = message[LIFECYCLE_KEY]
        auto_scaling_group = message[ASG_KEY]
        life_cycle_token = message[LIFECYCLE_TOKEN]
        url = f"http://{USERNAME}:{API_TOKEN}@{JENKINS_URL}:{JENKINS_PORT}/job/deployment/buildWithParameters?HOST_IP={instance_ip}&HOST_ID={instance_id}"
        print(url)
        response = requests.post(url)
        if response.status_code != 201:
            print(response.status_code)
            result = 'ABANDON'
        result = 'CONTINUE'    
        notify_lifecycle(life_cycle_hook, auto_scaling_group, instance_id, life_cycle_token, result)
    return {}

def notify_lifecycle(life_cycle_hook, auto_scaling_group, instance_id, life_cycle_token, result):
    asg_client = boto3.client('autoscaling', region_name=RUNTIME_REGION)
    try:
        response = asg_client.complete_lifecycle_action(
            LifecycleHookName=life_cycle_hook,
            AutoScalingGroupName=auto_scaling_group,
            LifecycleActionToken=life_cycle_token,
            LifecycleActionResult=result,
            InstanceId=instance_id
        )
        logger.debug(response)
    except Exception as e:
        logger.error(
            "Lifecycle hook notified could not be executed: %s", str(e))
        raise e

def discover_instance_ip(instance_id):
    ec2_client = boto3.resource("ec2", region_name=RUNTIME_REGION)
    instance = ec2_client.Instance(instance_id)
    return instance.private_ip_address

I would appreciate any help with this one or if someone can share already working solution.

Rafal

The following happens:

If I increase Desired state of the asg group by 1 (1 instance to be launched) then all seems to be ok:
- jenkins job is triggered once
- asg instance lifecycle change from 'Pending:Wait' to 'InService'
- from the logs I can see that lambda function run only once
If I increse 'Desired state' of the asg group by 2 units (increasing from 1 to 3 by adding 2 extra instances) then the following happens:
- 4 jenkins jobs triggered
- jenkins job was triggered twice for the same instance id
- asg instance lifecycle change from 'Pending:Wait' to 'InService' for all instances
- from the logs I can see that lambda function run twice for every instance in service managed by the asg
- two log streams are created
If I increse 'Desired state' of the asg group by 3 units (increasing from 1 to 4 by adding 3 extra instances) then the following happens:
- 5 jenkins jobs triggered
- jenkins job was triggered twice for 2 instances and once for one instance
- asg instance lifecycle change from 'Pending:Wait' to 'InService' for all instances
- from the logs I can see that lambda function run twice for every instance in service managed by the asg
- two log streams are created

Problem with AWS autoscaling lifecycle hook integrated with lambda (python) and jenkins

Answers (0)

Related Questions