Update autoscaling group AMIs and running instances

Question

I'm trying to setup the following for a project

EC2 instances in an auto scaling group, behind an elastic load balancer
A CodeDeploy application to deploy new versions of my application to the EC2 instances

I have a question regarding the AMIs on which the EC2 instances are based. If I want to make some changes to the systems' configuration (say update the libssl package), I see two options:

(1) run packer / manually create a new AMI and setup my auto scaling group to use it. Then, restart the instances so they use the new AMI. This is obviously really slow and causes downtime.
(2) use a configuration management tool such as Ansible to run yum update libssl on the instances, but this would not persist the changes to the instances launched in the future
(3) create a new AMI (manually or using packer) and then use a configuration management tool to shut down the old instances and run new ones using the new AMI. This is the option I think is the best, but I'm not sure how to do it in detail, neither how to avoid downtime. Also, it would remain quite slow (~10min I guess)

What would be the best way to do this (avoiding downtime)? Are there some best practise I should stick to?

Thanks

[Edit] I came accross aws-ha-release from aws-missing-tools, which enables to restart all instances from an auto scaling group without any downtime. I guess this could be used in conjunction with packer to force the running instance to use the new AMI. Any feedback on this? I feel like it's a little bit hacky.

BestPractices · Accepted Answer

Here are some options:

1 Use Two Autoscale Groups

If you are trying to prevent downtime while deploying new code, take advantage of the fact that an ELB can have multiple autoscale groups/launch configs associated to it.

You can have:

autoscale-A, launchconfig-A which are the autoscale group and launch configs of your version "A" of your servers.
autoscale-B, launchconfig-B which are the autoscale group and launch configs of your verison "B" of your servers.

A represents version X of the code, and B represents version X+1 (including any changes to O/S configuration such as libssl)

Now when you want to roll out version X + 1 of your code, simple "bake" a new AMI, configured exactly how you like it, and add the autoscale group B to the elb. Once the autoscale group and its instances are in service, set the max/capacity of the autoscale group A to 0, taking the version X servers out of the ELB. Only your version X + 1 will be running. When new instances come up in the future e.g. if a server fails, they'll be using your X + 1 AMI and have all of it's configuration changes.

Note if your application talks to a database, you will need to ensure that version X of the code and version X + 1 can operate on the same version of the database e.g. if version X + 1 removes a table that version X uses, then you'll get errors from users hitting verison X of your application. #1 works well when there are either no database changes in your code release, or if you've built in backwards compatibility when you roll out a new version of the code.

2 Combine Config Management Tool with the Health Check

If all you are wanting to do is update the O/S e.g. patch a version, then you can combine your thought of using a tool like Ansible with the ELB health check.

When you want to patch a server, scale up your number of instances temporarily e.g. if you were running 3 instances, scale up to 6.
As part of their user data, run Ansible and only once it succesfully completes e.g. to update libssl, do you allow the health check to pass and the EC2 instance to serve traffic from the ELB.
Once the ELB is successfully seeing the new EC2 instances, scale down the number of instances in the auto scale group to its original capacity (in this case, 3).
Note: The oldest instances will be the ones that AWS terminates, meaning that the only instances that will be left running are your 3 new instances.
If an instance fails and a new one spins up, it will start with your base AMI, apply any Ansible changes (and only once the changes are present, will the health check pass and it be put in service).
(This is your (2) but fixes the issue of new instances not containing the libssl version change)

Note on speed

Option 1 will allow failed instances to be in service faster than Option 2 (since you are not waiting on Ansible to run) at the expense of having to "pre-bake" your AMI.
Option 2 will allow you greater flexibility and speed for patching production servers e.g. if you need to "patch something now" this might be the quickest way. Having something like Ansible running and the ability to patch the O/S (separating that task from the deploying code task) can come with additional advantages, depending on your use case. Providing an agent-less hook into your server's configuration (libraries, user management, etc) is quite powerful, especially in the cloud.

Update autoscaling group AMIs and running instances

Answers (2)

1 Use Two Autoscale Groups

2 Combine Config Management Tool with the Health Check

Related Questions