allquixotic
allquixotic

Reputation: 1569

EC2 Java StartInstancesRequest goes from "pending" to "stopping" to "stopped"

I have the following situation:

I have a Lambda function that logs all EC2 state changes across the VPC as follows:

'use strict';
exports.handler = (event, context, callback) => {
  console.log('LogEC2InstanceStateChange');
  console.log('Received event:', JSON.stringify(event, null, 2));
  callback(null, 'Finished');
}

And another Lambda function that tries to start EC2 instances based on a schedule, written in Java, which is a lot of code, but the core of it is something like this:

public void handleRequest(Object input, Context context) {
  final List<String> instancesToStart = getInstancesToStart(); //implementation not shown
  try {
    StartInstancesRequest startRequest = new StartInstancesRequest().withInstanceIds((String[]) instancesToStart.toArray());
    context.logger.log("StartInstancesRequest: " + startRequest.toString());
    StartInstancesResult res = ec2.startInstances(startRequest);
    context.logger.log("StartInstancesResult: " + res.toString());
  }
  catch(Exception e) {
    logException(e); //calls context.logger.log on the stack trace string
  }
}

The instancesToStart array is populated with instance IDs like i-0abcdef1234567890.

I create the Lambda functions and all required IAM roles, etc. using CloudFormation. Here is the bit describing the role/permissions for the Java-based Lambda function that does the work:

Resources:
  EC2SchedulerRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
  EC2SchedulerPolicy:
    DependsOn:
      - EC2SchedulerRole
    Type: 'AWS::IAM::Policy'
    Properties:
      PolicyName: ec2-scheduler-role
      Roles:
        - !Ref EC2SchedulerRole
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action:
              - 'logs:*'
            Resource:
              - 'arn:aws:logs:*:*:*'
          - Effect: Allow
            Action:
              - 'ec2:DescribeInstanceAttribute'
              - 'ec2:DescribeInstanceStatus'
              - 'ec2:DescribeInstances'
              - 'ec2:StartInstances'
              - 'ec2:StopInstances'
              - 'ec2:DeleteTags'
            Resource:
              - '*'

What ends up happening is, according to the CloudWatch logs from the first function (the script that logs instance state transitions), we get:

Received event:
{
    "version": "0",
    "id": "<guid>",
    "detail-type": "EC2 Instance State-change Notification",
    "source": "aws.ec2",
    "account": "12345678",
    "time": "2019-06-20T19:01:35Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:ec2:us-east-1:12345678:instance/i-0abcdef12345678"
    ],
    "detail": {
        "instance-id": "i-0abcdef12345678",
        "state": "pending"
    }
}

Received event:
{
    "version": "0",
    "id": "<guid>",
    "detail-type": "EC2 Instance State-change Notification",
    "source": "aws.ec2",
    "account": "12345678",
    "time": "2019-06-20T19:01:37Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:ec2:us-east-1:12345678:instance/i-0abcdef12345678"
    ],
    "detail": {
        "instance-id": "i-0abcdef12345678",
        "state": "stopping"
    }
}

Received event:
{
    "version": "0",
    "id": "<guid>",
    "detail-type": "EC2 Instance State-change Notification",
    "source": "aws.ec2",
    "account": "12345678",
    "time": "2019-06-20T19:01:37Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:ec2:us-east-1:12345678:instance/i-0abcdef12345678"
    ],
    "detail": {
        "instance-id": "i-0abcdef12345678",
        "state": "stopped"
    }
}

And according to the CloudWatch logs from the "worker" function (the function that actually tries to start the instances), we get:

StartInstancesRequest: {InstanceIds: [i-0abcdef12345678],}
StartInstancesResult: {StartingInstances: [{CurrentState: {Code: 0,Name: pending},InstanceId: i-0abcdef12345678,PreviousState: {Code: 80,Name: stopped}}]}

So it seems from the perspective of the Java-based Lambda that does the work, it's doing all it needs to do, to give the command to make the EC2 instance start; but then when the EC2 instance tries to actually start, it goes from "pending" to "stopping" to "stopped". If it didn't have permission, it wouldn't even get that far, right?

If it were an issue with the instance itself (e.g. hardware), I would expect that manually starting it using the AWS Console would fail. But it doesn't fail. It succeeds when started manually!

So what's happening? How do I diagnose this further? Is it permissions or is the instance screwed up?

I'm 99% sure this isn't due to a lack of available capacity in the AZ, because whenever I try to start the instance manually it always works. It's not an ephemeral issue or something that has only been happening recently. This has been persisting for several months like this, where manual starting works 100% of the time, and script based starting works 0% of the time.

Upvotes: 0

Views: 1047

Answers (2)

Sayantan Mandal
Sayantan Mandal

Reputation: 1336

Booting up EBS might be the issue. As you have mentioned EC2 is having 3 EBS volumes with KMS encryption. You have to provide KMS permission(kms:CreateGrant) to start your instances

{
        "Sid": "GrantAccess",
        "Effect": "Allow",
        "Action": "kms:CreateGrant",
        "Resource": "arn:aws:kms:::key/1234"
}

Upvotes: 5

deosha
deosha

Reputation: 990

Try this policy and see if it works. If it does, there is the problem with the policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:Start*",
        "ec2:Stop*"
      ],
      "Resource": "*"
    }
  ]
}

Upvotes: 0

Related Questions