Reputation: 51
We have a setup where 3 ec2 instances each are associated with an elastic ip on its primary network interface eth0 so incoming requests can be served by these instances.
Each of these instances has a secondary network interface eth1 where in the event of a failure/ crash/ reboot of an instance, the elastic ip associated with that instance would be associated to one of the remaining running ec2 instances on that interface. This is some sort of failover mechanism as we always want those elastic ips to be served by some running instance so we don't lose any incoming requests.
The problem I have experienced is specifically on reboot of an instance. When an instance reboots, it cannot get back the public ip it had where this public ip is that of the elastic ip that is now associated with another instance. Thus this instance cannot access the internet unless I manually re-assign the elastic ip back to this instance.
Is it possible to automatically reclaim/re-associate the elastic ip it once had onto its eth1 interface on reboot? If not, do you have suggestions for a workaround?
Reboot is necessary as we would be doing unattended upgrades on the instances.
Update: Also note that I need to use these elastic ips as they are the ones allowed in the firewall of a partner company we integrate with. Using ELBs won't work as its IP changes over time.
Upvotes: 3
Views: 2393
Reputation: 51
So here's how I finally solved this problem. What I missed out on was that Amazon only provides a new public IP to an instance under two conditions.
So based on this, on startup, i configure the instance with two instances but i detach the secondary eth1 interface. Hence this makes the instance eligible for getting a new public IP (if for any reason it reboots).
Now for failover, once one of the running instances detects an instance has gone offline from the cluster (in this case, lets say it rebooted), it will then on the fly attach the secondary interface and associate the elastic IP to it. Hence, the elastic IP is now being served by atleast one of the running instances. The effect is instant.
Now when the failed instance comes back up after reboot, amazon already provided it a new non-elastic public IP. This was because it fulfilled the two conditions of having just one network interface and also its elastic IP was disassociated and re-associated to another running instance. Hence, this rebooted instance now has a new public IP and can connect to the internet on startup and do the necessary tasks it needs to configure itself and re-join the cluster. After that it re-associates back the elastic ip it needed to have.
Also, when the running instance that took over the elastic IP detects a new instance or the rebooted instance has come online, it detaches the secondary interface again so it would be eligible to get a new public ip as well if it rebooted.
This is how i handle the failover and making sure the elastic ips are always served. However this solution is not perfect and can be improved. It can scale to handling N failed/rebooted instances provided N network interfaces can be used for failover!
However if the instance that attached secondary interface(s) during failover reboots, it will not get a new public IP and will remain disconnected from the cluster, but atleast the elastic IPs would still be served by remaining live instances. This is only in the case of reboots.
BTW, atleast from all that i read, these conditions of getting a new public ip wasn't clearly mentioned in the amazon docs.
Upvotes: 2
Reputation: 3005
It sounds like you would be better served by using an elastic load balancer (ELB). You could just use one ELB and it would serve requests to your 3 application servers.
If one goes down, the ELB detects that and stops routing requests there. When it comes back online, the ELB detects that and adds it to the routing group again.
http://aws.amazon.com/elasticloadbalancing/
Upvotes: 0