Reputation: 73
we've recently split our main web app(that runs on EC2 in https behind a Load balancer with autoscaling) in two separate web modules.
The main infrastructure has now one load balancer and n-server for the main module (main.elasticbeanstalk.com) and a load balancer with n-server for the secondary module (secondary.elasticbeanstalk.com)
We've created a HAproxy dedicated instance that is resolved by the domain www.mycompany.com and proxies the request as following:
-://www.mycompany.com/fancymodule -> secondary.elasticbeanstalk.com
-://www.mycompany.com/ -> main.elasticbeanstalk.com
We put it production and after ~12hours.. http://www.mycompany.com/fancymodule start getting 503 Service unavailable. If I manually restart HAproxy everything start working wonderfully.
I've managed to replicate the issue renewing the ip address associated to secondary.elasticbeanstalk.com (es: Converting from a load balancer to a single instance).
Seems like HAproxy is not renewing the dns resolving to the secondary.elasticbeanstalk.com, so it get stuck with the old ip and cannot reach correctly the web server.
And is not a short downtime! It doesn't route correctly until I restart the service!
Is it possible that the load balancer, being in elasticIp, get associated with a new ipaddress an therefore is no longer reachable?
Can someone give a look to this config and tell me if I'm doing something stupid?
global
log 127.0.0.1:514 local2 info
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
tune.ssl.default-dh-param 2048
defaults
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
mode http
option httplog
frontend mydomain
log global
bind *:80
bind *:443 ssl crt /etc/ssl/certificate.pem
acl isSsl ssl_fc
redirect scheme https if !isSsl
option dontlog-normal
mode http
acl secondaryDomain url_beg /fancymodule
acl liveDomain hdr_end(Host) -i www.mycompany.com
use_backend live_secondary if secondaryDomain liveDomain
use_backend live_main if liveDomain
default_backend live_main
backend live_main
rspadd Set-Cookie:\ module=main;Path=/
server main main.elasticbeanstalk.com:80
backend live_secondary
rspadd Set-Cookie:\ module=secondary;Path=/
server secondary secondary.elasticbeanstalk.com:80
listen stats :1234
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /stats
stats auth user:pswd
Upvotes: 3
Views: 3019
Reputation: 902
I've hit the same issue with a setup where a backend server was deployed in AWS. The HAProxy running in our internal networks would suddenly take this backend server DOWN with a L7STS/503 check result, while our monitoring was accessing the backend server (directly) just fine. As we run a HAProxy pair (LB01 and LB02) a reload of LB01 immediately worked and the backend server was UP again. On LB02 (not reloaded on purpose) this backend server is still down.
All this seems to related to a DNS change of the AWS LB and how HAProxy does DNS caching. By default, HAProxy resolves all DNS records (e.g. for backends) at startup/reload. These resolved DNS records then stay in HAProxy's own DNS cache. So you would have to launch a reload of HAProxy to renew the DNS cache.
Another and without doubt the better solution is to define DNS servers and the HAProxy internal DNS cache TTL. This is possible since HAProxy version 1.6 with a config snippet like this:
global
[...]
defaults
[...]
resolvers mydns
nameserver dnsmasq 127.0.0.1:53
nameserver dns1 192.168.1.1:53
nameserver dns2 192.168.1.253:53
hold valid 60s
frontend app-in
bind *:8080
default_backend app-out
backend app-out
server appincloud myawslb.example.com:443 check inter 2s ssl verify none resolvers mydns resolve-prefer ipv4
So what this does is to define a DNS nameserver set called mydns
using the DNS servers defined by the entries starting with nameserver
. An internal DNS cache should be kept for 60s defined by hold valid 60s
.
In the backend server's definition you now refer to this DNS nameserver set by adding resolvers mydns
. In this example it is preferred to resolve to IPv4 addresses by adding resolve-prefer ipv4
(default is to use ipv6).
Note that in order to use resolvers
in the backend server, check
must be defined, too. The DNS lookup happens whenever the backend server check is triggered. In this example check inter 2s
is defined which means a DNS lookup happens would happen every 2 seconds. This would be quite a lot of lookups. By setting the internal hold
cache to 60 seconds, you can therefore limit the number of DNS lookups until the cache expires; latest after 62 seconds a new DNS lookup should therefore happen.
Starting with HAProxy version 1.8 there is even an advanced possibility called "Service Discovery over DNS" which uses DNS SRV Records. These records contain multiple response fields such as priorities, weights, etc. which can be parsed by HAProxy and update the backends accordingly.
Further information:
Upvotes: 1
Reputation: 73
I found out that, to improve performance, HAproxy just replace the domain of the config with the actual ip address on start. No dns resolution is done afterwards.
http://www.serverphorums.com/read.php?10,358605 https://serverfault.com/questions/681000/force-haproxy-to-lookup-dns-for-backend-server
So, the solution is either create an autoscaling loadbalancer with HAProxy, or automate the reload of the service with an external listener on the dns ip change
Upvotes: 1