Simon
Simon

Reputation: 387

AWS Architecture: How to launch multiple & short lived EC2 instances and how to keep track of vCPU service quota / limit

To conduct performance tests on different EC2 instance types I'd like to launch multiple ec2 instances for a short period of time. During the launch process I run a user_data (bash)-script to perform the measurements, store the result in a S3 bucket and shut the instance down.

My current approach:

  1. Lambda function to get relevant Instance types
  2. Lambda pushes instance types as individual messages to a SQS queue
  3. The queue is configured to trigger a 2nd lambda function which launches for each message a instance with the user_data script to perform performance measurements

My problem: as the 2nd lambda function is processing the queue-messages and is spinning up new instances, it will hit the vCPU limit of my account. Because it may take up to 10min for each instance to complete the measurement, the retries also fail and the remaining messages end up the the DLQ.

Question: How can I launch new instances until the vCpu quota is reached and then spin up new ones (as running instances will shut down after the user_data script has finished). Probably I need to somehow keep track of my current vCPU usage/quota and invoke the lambda but was not able to come up with a good solution how to orchestrate the whole process (as I'm still a junior dev and fairly new to AWS).

Does anyone have a recommendation how to tackle that problem? any input is highly appreciated.

Thx a lot and BR!

Upvotes: 0

Views: 237

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 270294

Since this is a one-off requirement, it's probably easier to change the way you are launching instances rather than figuring out the complex logic of vCPU Limits.

The Amazon SQS queue will trigger the AWS Lambda functions very quickly, which you don't want to happen.

Instead, I'd recommend launching the instances via a single-threaded script running on your own computer or an EC2 instance (rather than from Lambda). This way, it will only launch one instance at a time. Your script should poll describe_instance() to determine whether the instance has stopped/terminated and then continue to the next instance type.

This will be slower than doing all tests in parallel, but it will avoid hitting any limits.

You can make it faster by running tests in multiple regions -- divide the instance types into a few groups, pick a region for each group and then run multiple copies of your script (one per region). Your script can run in parallel on the same computer (eg using sh script.sh &), with each script controlling activities in a different region.

Upvotes: 0

Related Questions