Reputation: 923
We are having difficulties choosing a load balancing solution (Load Balancer, Application Gateway, Traffic Manager, Front Door) for IIS websites on Azure VMs. The simple use case when there are 2 identical sites is covered well – just use Azure Load Balancer or Application Gateway. However, in cases when we would like to update websites and test those updates, we encounter limitation of load balancing solutions.
For example, if we would like to update IIS websites on VM1 and test those updates, the strategy would be:
We would like to know what is the best solution for directing traffic to only one VM. So far, we only see one option – removing a VM from backend address pool then returning it back and repeating the process for other VMs. Surely, there must be a better way to direct 100% of traffic to only one (or to specific VMs), right?
Update:
We ended up blocking the connection between VMs and Load Balancer by creating Network Security Group rule with Deny action on Service Tag Load Balancer. Once we want that particular VM to be accessible again we switch the NSG rule from Deny to Allow.
The downside of this approach is that it takes 1-3 minutes for the changes to take an effect. Continuous Delivery with Azure Load Balancer
If anybody can think of a faster (or instantaneous) solution for this, please let me know.
Upvotes: 3
Views: 1197
Reputation: 923
We ended up blocking connection between VMs and Load Balancer by creating Network Security Group rule with Deny action on Service Tag Load Balancer. Once we want that particular VM to be accessible again we switch the NSG rule from Deny to Allow.
The downside of this approach is that it takes 1-3 minutes for the changes to take an effect. Continuous Delivery with Azure Load Balancer
If anybody can think of a faster (or instantaneous) solution for this, please let me know.
Upvotes: 1
Reputation: 704
I had exactly the same requirement in an Azure environment which I built a few years ago. Azure Front Door didn't exist, and I had looked into using the Azure API to automate the process of adding and removing backend servers the way you described. It worked sometimes, but I found the Azure API was unreliable (lots of 503s reconfiguring the load balancer) and very slow to divert traffic to/from servers as I added or removed them from my cluster.
The solution that follows probably won't be well received if you are looking for an answer which purely relies upon Azure resources, but this is what I devised:
I configured an Azure load balancer with the simplest possible HTTP and HTTPS round-robin load balancing of requests on my external IP to two small Azure VMs running Debian with HAProxy. I then configured each HAProxy VM with backends for the actual IIS servers. I configured the two HAProxy VMs in an availability set such that Microsoft should not ever reboot them simultaneously for maintenance.
HAProxy is an excellent and very robust load balancer, and it supports nearly every imaginable load balancing scenario, and crucially for your question, it also supports listening on a socket to control the status of the backends. I configured the following in the global section of my haproxy.cfg:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats socket [email protected]:9001 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
In my example, 192.168.95.100 is the first HAProxy VM, and 192.168.95.101 is the second. On the second server, these lines would be identical except for its internal IP.
Let's say you have an HAProxy frontend and backend for your HTTPS traffic to two web servers, ws1pro and ws2pro with the IPs 192.168.95.10 and 192.168.95.11 respectively. For simplicity sake, I'll assume we don't need to worry about HTTP session state differences across the two servers (e.g. Out-of-Process session state) so we just divert HTTPS connections to one node or the other:
listen stats
bind *:8080
mode http
stats enable
stats refresh 10s
stats show-desc Load Balancer
stats show-legends
stats uri /
frontend www_https
bind *:443
mode tcp
option tcplog
default_backend backend_https
backend backend_https
mode tcp
balance roundrobin
server ws1pro 192.168.95.10:443 check inter 5s
server ws2pro 192.168.95.11:443 check inter 5s
With the configuration above, since both HAProxy VMs are listening for admin commands on port 9001, and the Azure load balancer is sending the client's requests to either VM, we need to tell both servers to disable the same backend.
I used Socat to send the cluster control commands. You could do this from a Linux VM, but there is also a Windows version of Socat, and I used the Windows version in a set of really simple batch files. The cluster control commands would actually be the same in BASH.
stop_ws1pro.bat:
echo disable server backend_https/ws1pro | socat - TCP4:192.168.95.100:9001
echo disable server backend_https/ws1pro | socat - TCP4:192.168.95.101:9001
start_ws1pro.bat:
echo enable server backend_https/ws1pro | socat - TCP4:192.168.95.100:9001
echo enable server backend_https/ws1pro | socat - TCP4:192.168.95.101:9001
These admin commands execute almost instantly. Since the HAProxy configuration above enables the stats page, you should be able to watch the status change happen on the stats page as soon as it refreshes. The stats page will show the connections or sessions draining from the server you disabled over to the remaining enabled servers when you disable a backend, and then show them returning to the server once it is enabled again.
Upvotes: 1
Reputation: 29208
Without any Azure specifics, the usual pattern is to point a load balancer to a /status endpoint of your process, and to design the endpoint behavior according to your needs, eg:
Meanwhile the load balancer polls the /status endpoint every minute and knows to mark down / exclude forwarding for any servers not in the 'ok' state.
Some load balancers / gateways may work best with HTTP status codes whereas others may be able to read response text from the status endpoint. Pretty much all of them will support this general behavior though - you should not need an expensive solution.
Upvotes: 1