EC2 Java StartInstancesRequest goes from "pending" to "stopping" to "stopped"

Question

I have the following situation:

Dedicated tenancy m4.large EC2 instance running RHEL6
Manually starting it using the AWS Console works fine
Lambda function (written in Java) that tries to start it, fails, because the instance state goes: stopped -> pending -> stopping -> stopped

I have a Lambda function that logs all EC2 state changes across the VPC as follows:

'use strict';
exports.handler = (event, context, callback) => {
  console.log('LogEC2InstanceStateChange');
  console.log('Received event:', JSON.stringify(event, null, 2));
  callback(null, 'Finished');
}

And another Lambda function that tries to start EC2 instances based on a schedule, written in Java, which is a lot of code, but the core of it is something like this:

public void handleRequest(Object input, Context context) {
  final List instancesToStart = getInstancesToStart(); //implementation not shown
  try {
    StartInstancesRequest startRequest = new StartInstancesRequest().withInstanceIds((String[]) instancesToStart.toArray());
    context.logger.log("StartInstancesRequest: " + startRequest.toString());
    StartInstancesResult res = ec2.startInstances(startRequest);
    context.logger.log("StartInstancesResult: " + res.toString());
  }
  catch(Exception e) {
    logException(e); //calls context.logger.log on the stack trace string
  }
}

The instancesToStart array is populated with instance IDs like i-0abcdef1234567890.

I create the Lambda functions and all required IAM roles, etc. using CloudFormation. Here is the bit describing the role/permissions for the Java-based Lambda function that does the work:

Resources:
  EC2SchedulerRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
  EC2SchedulerPolicy:
    DependsOn:
      - EC2SchedulerRole
    Type: 'AWS::IAM::Policy'
    Properties:
      PolicyName: ec2-scheduler-role
      Roles:
        - !Ref EC2SchedulerRole
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action:
              - 'logs:*'
            Resource:
              - 'arn:aws:logs:*:*:*'
          - Effect: Allow
            Action:
              - 'ec2:DescribeInstanceAttribute'
              - 'ec2:DescribeInstanceStatus'
              - 'ec2:DescribeInstances'
              - 'ec2:StartInstances'
              - 'ec2:StopInstances'
              - 'ec2:DeleteTags'
            Resource:
              - '*'

What ends up happening is, according to the CloudWatch logs from the first function (the script that logs instance state transitions), we get:

Received event:
{
    "version": "0",
    "id": "",
    "detail-type": "EC2 Instance State-change Notification",
    "source": "aws.ec2",
    "account": "12345678",
    "time": "2019-06-20T19:01:35Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:ec2:us-east-1:12345678:instance/i-0abcdef12345678"
    ],
    "detail": {
        "instance-id": "i-0abcdef12345678",
        "state": "pending"
    }
}

Received event:
{
    "version": "0",
    "id": "",
    "detail-type": "EC2 Instance State-change Notification",
    "source": "aws.ec2",
    "account": "12345678",
    "time": "2019-06-20T19:01:37Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:ec2:us-east-1:12345678:instance/i-0abcdef12345678"
    ],
    "detail": {
        "instance-id": "i-0abcdef12345678",
        "state": "stopping"
    }
}

Received event:
{
    "version": "0",
    "id": "",
    "detail-type": "EC2 Instance State-change Notification",
    "source": "aws.ec2",
    "account": "12345678",
    "time": "2019-06-20T19:01:37Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:ec2:us-east-1:12345678:instance/i-0abcdef12345678"
    ],
    "detail": {
        "instance-id": "i-0abcdef12345678",
        "state": "stopped"
    }
}

And according to the CloudWatch logs from the "worker" function (the function that actually tries to start the instances), we get:

StartInstancesRequest: {InstanceIds: [i-0abcdef12345678],}
StartInstancesResult: {StartingInstances: [{CurrentState: {Code: 0,Name: pending},InstanceId: i-0abcdef12345678,PreviousState: {Code: 80,Name: stopped}}]}

So it seems from the perspective of the Java-based Lambda that does the work, it's doing all it needs to do, to give the command to make the EC2 instance start; but then when the EC2 instance tries to actually start, it goes from "pending" to "stopping" to "stopped". If it didn't have permission, it wouldn't even get that far, right?

If it were an issue with the instance itself (e.g. hardware), I would expect that manually starting it using the AWS Console would fail. But it doesn't fail. It succeeds when started manually!

So what's happening? How do I diagnose this further? Is it permissions or is the instance screwed up?

I'm 99% sure this isn't due to a lack of available capacity in the AZ, because whenever I try to start the instance manually it always works. It's not an ephemeral issue or something that has only been happening recently. This has been persisting for several months like this, where manual starting works 100% of the time, and script based starting works 0% of the time.

Sayantan Mandal · Accepted Answer

Booting up EBS might be the issue. As you have mentioned EC2 is having 3 EBS volumes with KMS encryption. You have to provide KMS permission(kms:CreateGrant) to start your instances

{
        "Sid": "GrantAccess",
        "Effect": "Allow",
        "Action": "kms:CreateGrant",
        "Resource": "arn:aws:kms:::key/1234"
}

EC2 Java StartInstancesRequest goes from "pending" to "stopping" to "stopped"

Answers (2)

Related Questions

EC2 Java StartInstancesRequest goes from &quot;pending&quot; to &quot;stopping&quot; to &quot;stopped&quot;

Answers (2)

Related Questions

EC2 Java StartInstancesRequest goes from "pending" to "stopping" to "stopped"