Malo87
Malo87

Reputation: 73

HAProxy and AWS loadBalancer - 503 error

we've recently split our main web app(that runs on EC2 in https behind a Load balancer with autoscaling) in two separate web modules.

The main infrastructure has now one load balancer and n-server for the main module (main.elasticbeanstalk.com) and a load balancer with n-server for the secondary module (secondary.elasticbeanstalk.com)

We've created a HAproxy dedicated instance that is resolved by the domain www.mycompany.com and proxies the request as following:

-://www.mycompany.com/fancymodule -> secondary.elasticbeanstalk.com

-://www.mycompany.com/ -> main.elasticbeanstalk.com

We put it production and after ~12hours.. http://www.mycompany.com/fancymodule start getting 503 Service unavailable. If I manually restart HAproxy everything start working wonderfully.

I've managed to replicate the issue renewing the ip address associated to secondary.elasticbeanstalk.com (es: Converting from a load balancer to a single instance).

Seems like HAproxy is not renewing the dns resolving to the secondary.elasticbeanstalk.com, so it get stuck with the old ip and cannot reach correctly the web server.

And is not a short downtime! It doesn't route correctly until I restart the service!

Is it possible that the load balancer, being in elasticIp, get associated with a new ipaddress an therefore is no longer reachable?

Can someone give a look to this config and tell me if I'm doing something stupid?

global
  log         127.0.0.1:514 local2 info
  chroot    /var/lib/haproxy
  pidfile     /var/run/haproxy.pid
  maxconn     4000
  user        haproxy
  group   haproxy
  daemon

# turn on stats unix socket
  stats socket /var/lib/haproxy/stats
  tune.ssl.default-dh-param 2048

defaults
  retries                 3
  timeout http-request    10s
  timeout queue           1m
  timeout connect         10s
  timeout client          1m
  timeout server          1m
  timeout http-keep-alive 10s
  timeout check           10s
  mode    http
  option  httplog

frontend mydomain
  log global
  bind *:80
  bind *:443 ssl crt /etc/ssl/certificate.pem
  acl isSsl ssl_fc
  redirect scheme https if !isSsl
  option dontlog-normal
  mode http

  acl secondaryDomain url_beg /fancymodule

  acl liveDomain hdr_end(Host) -i www.mycompany.com

  use_backend live_secondary if secondaryDomain liveDomain
  use_backend live_main if liveDomain

  default_backend live_main

backend live_main
  rspadd Set-Cookie:\ module=main;Path=/
  server main main.elasticbeanstalk.com:80

backend live_secondary
  rspadd Set-Cookie:\ module=secondary;Path=/
  server secondary secondary.elasticbeanstalk.com:80

listen stats :1234
  mode http
  stats enable
  stats hide-version
  stats realm Haproxy\ Statistics
  stats uri /stats
  stats auth user:pswd

Upvotes: 3

Views: 3019

Answers (2)

Claudio Kuenzler
Claudio Kuenzler

Reputation: 902

I've hit the same issue with a setup where a backend server was deployed in AWS. The HAProxy running in our internal networks would suddenly take this backend server DOWN with a L7STS/503 check result, while our monitoring was accessing the backend server (directly) just fine. As we run a HAProxy pair (LB01 and LB02) a reload of LB01 immediately worked and the backend server was UP again. On LB02 (not reloaded on purpose) this backend server is still down.

All this seems to related to a DNS change of the AWS LB and how HAProxy does DNS caching. By default, HAProxy resolves all DNS records (e.g. for backends) at startup/reload. These resolved DNS records then stay in HAProxy's own DNS cache. So you would have to launch a reload of HAProxy to renew the DNS cache.

Another and without doubt the better solution is to define DNS servers and the HAProxy internal DNS cache TTL. This is possible since HAProxy version 1.6 with a config snippet like this:

global
[...]

defaults
[...]

resolvers mydns
  nameserver dnsmasq 127.0.0.1:53
  nameserver dns1 192.168.1.1:53
  nameserver dns2 192.168.1.253:53
  hold valid 60s

frontend app-in
  bind *:8080
  default_backend app-out

backend app-out
  server appincloud myawslb.example.com:443 check inter 2s ssl verify none resolvers mydns resolve-prefer ipv4 

So what this does is to define a DNS nameserver set called mydns using the DNS servers defined by the entries starting with nameserver. An internal DNS cache should be kept for 60s defined by hold valid 60s. In the backend server's definition you now refer to this DNS nameserver set by adding resolvers mydns. In this example it is preferred to resolve to IPv4 addresses by adding resolve-prefer ipv4 (default is to use ipv6).

Note that in order to use resolvers in the backend server, check must be defined, too. The DNS lookup happens whenever the backend server check is triggered. In this example check inter 2s is defined which means a DNS lookup happens would happen every 2 seconds. This would be quite a lot of lookups. By setting the internal hold cache to 60 seconds, you can therefore limit the number of DNS lookups until the cache expires; latest after 62 seconds a new DNS lookup should therefore happen.

Starting with HAProxy version 1.8 there is even an advanced possibility called "Service Discovery over DNS" which uses DNS SRV Records. These records contain multiple response fields such as priorities, weights, etc. which can be parsed by HAProxy and update the backends accordingly.

Further information:

Upvotes: 1

Malo87
Malo87

Reputation: 73

I found out that, to improve performance, HAproxy just replace the domain of the config with the actual ip address on start. No dns resolution is done afterwards.

http://www.serverphorums.com/read.php?10,358605 https://serverfault.com/questions/681000/force-haproxy-to-lookup-dns-for-backend-server

So, the solution is either create an autoscaling loadbalancer with HAProxy, or automate the reload of the service with an external listener on the dns ip change

Upvotes: 1

Related Questions