HAProxy and AWS loadBalancer - 503 error

we've recently split our main web app(that runs on EC2 in https behind a Load balancer with autoscaling) in two separate web modules.

The main infrastructure has now one load balancer and n-server for the main module (main.elasticbeanstalk.com) and a load balancer with n-server for the secondary module (secondary.elasticbeanstalk.com)

We've created a HAproxy dedicated instance that is resolved by the domain www.mycompany.com and proxies the request as following:

-://www.mycompany.com/fancymodule -> secondary.elasticbeanstalk.com

-://www.mycompany.com/ -> main.elasticbeanstalk.com

We put it production and after ~12hours.. http://www.mycompany.com/fancymodule start getting 503 Service unavailable. If I manually restart HAproxy everything start working wonderfully.

I've managed to replicate the issue renewing the ip address associated to secondary.elasticbeanstalk.com (es: Converting from a load balancer to a single instance).

Seems like HAproxy is not renewing the dns resolving to the secondary.elasticbeanstalk.com, so it get stuck with the old ip and cannot reach correctly the web server.

And is not a short downtime! It doesn't route correctly until I restart the service!

Is it possible that the load balancer, being in elasticIp, get associated with a new ipaddress an therefore is no longer reachable?

Can someone give a look to this config and tell me if I'm doing something stupid?

global log 127.0.0.1:514 local2 info chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 user haproxy group haproxy daemon # turn on stats unix socket stats socket /var/lib/haproxy/stats tune.ssl.default-dh-param 2048 defaults retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s mode http option httplog frontend mydomain log global bind *:80 bind *:443 ssl crt /etc/ssl/certificate.pem acl isSsl ssl_fc redirect scheme https if !isSsl option dontlog-normal mode http acl secondaryDomain url_beg /fancymodule acl liveDomain hdr_end(Host) -i www.mycompany.com use_backend live_secondary if secondaryDomain liveDomain use_backend live_main if liveDomain default_backend live_main backend live_main rspadd Set-Cookie:\ module=main;Path=/ server main main.elasticbeanstalk.com:80 backend live_secondary rspadd Set-Cookie:\ module=secondary;Path=/ server secondary secondary.elasticbeanstalk.com:80 listen stats :1234 mode http stats enable stats hide-version stats realm Haproxy\ Statistics stats uri /stats stats auth user:pswd

Upvotes: 3

Answers (2)

Claudio Kuenzler

Reputation: 902

I've hit the same issue with a setup where a backend server was deployed in AWS. The HAProxy running in our internal networks would suddenly take this backend server DOWN with a L7STS/503 check result, while our monitoring was accessing the backend server (directly) just fine. As we run a HAProxy pair (LB01 and LB02) a reload of LB01 immediately worked and the backend server was UP again. On LB02 (not reloaded on purpose) this backend server is still down.

All this seems to related to a DNS change of the AWS LB and how HAProxy does DNS caching. By default, HAProxy resolves all DNS records (e.g. for backends) at startup/reload. These resolved DNS records then stay in HAProxy's own DNS cache. So you would have to launch a reload of HAProxy to renew the DNS cache.

Another and without doubt the better solution is to define DNS servers and the HAProxy internal DNS cache TTL. This is possible since HAProxy version 1.6 with a config snippet like this:

global
[...]

defaults
[...]

resolvers mydns
  nameserver dnsmasq 127.0.0.1:53
  nameserver dns1 192.168.1.1:53
  nameserver dns2 192.168.1.253:53
  hold valid 60s

frontend app-in
  bind *:8080
  default_backend app-out

backend app-out
  server appincloud myawslb.example.com:443 check inter 2s ssl verify none resolvers mydns resolve-prefer ipv4

So what this does is to define a DNS nameserver set called mydns using the DNS servers defined by the entries starting with nameserver. An internal DNS cache should be kept for 60s defined by hold valid 60s. In the backend server's definition you now refer to this DNS nameserver set by adding resolvers mydns. In this example it is preferred to resolve to IPv4 addresses by adding resolve-prefer ipv4 (default is to use ipv6).

Note that in order to use resolvers in the backend server, check must be defined, too. The DNS lookup happens whenever the backend server check is triggered. In this example check inter 2s is defined which means a DNS lookup happens would happen every 2 seconds. This would be quite a lot of lookups. By setting the internal hold cache to 60 seconds, you can therefore limit the number of DNS lookups until the cache expires; latest after 62 seconds a new DNS lookup should therefore happen.

Starting with HAProxy version 1.8 there is even an advanced possibility called "Service Discovery over DNS" which uses DNS SRV Records. These records contain multiple response fields such as priorities, weights, etc. which can be parsed by HAProxy and update the backends accordingly.

Further information:

Upvotes: 1

Malo87

Reputation: 73

I found out that, to improve performance, HAproxy just replace the domain of the config with the actual ip address on start. No dns resolution is done afterwards.

http://www.serverphorums.com/read.php?10,358605 https://serverfault.com/questions/681000/force-haproxy-to-lookup-dns-for-backend-server

So, the solution is either create an autoscaling loadbalancer with HAProxy, or automate the reload of the service with an external listener on the dns ip change

Upvotes: 1

HAProxy and AWS loadBalancer - 503 error

Answers (2)

Related Questions