Gady
Gady

Reputation: 4995

504s from nginx on EC2 running Node.js causing 503s at ELB

Issue

I have a Node.js app running on 6 EC2 instances with nginx, all of them behind ELB. I've been getting an increase of 504 Gateway Time-out errors from nginx on the EC2 instances, which results in unhealthy hosts that are taken out of service from the ELB, which eventually causes the ELB to return 503 Service Unavailable: Back-end server is at capacity.

Question

The increases in 504s from nginx in the EC2 instances is likely due to slow queries or an increase in throughput, which is obviously the priority to fix here, but the main question I'm posing here is:

What is the optimal timeout config for nginx, ELB, etc to keep them all working together nicely and prevent these domino effects that take down the ELB?

Most of the solutions I've come across deal more with Apache or PHP settings, or I'm unsure if the nginx settings I'm finding really apply to my current setup (should I care about fastcgi or proxy settings?).

Current Config

Here is a breakdown of my current config, any other guidance would be much appreciated.

In nginx.conf, I have this:

http {
    ...
    keepalive_timeout 95;
    ...
}

Amazon says to "make sure that the value you set for the keep-alive time is greater than the idle timeout setting on your load balancer", so I'm covered here, since the ELB Idle Timeout is set to 90 seconds. Not sure if I should be using more settings in nginx.conf to not rely on defaults or also look elsewhere for other non-defaults.

I'm also using defaults in Node.js which I believe has a 120000 ms request timeout.

ELB has the following Connection Settings:

Idle Timeout: 90 seconds

ELB has the following Health Check settings:

Ping Protocol: HTTP
Timeout: 59 seconds
Interval: 60 seconds
Unhealthy Threshold: 3
Healthy Threshold: 2

Again, any guidance here is much appreciated.

Upvotes: 0

Views: 1152

Answers (1)

Mark Stosberg
Mark Stosberg

Reputation: 13381

The issue likely has less to do with ELB and Nginx and more to do with your Node.js app.

If the app is blocking the event loop, Nginx will detect the Node.js app as down and then in turn the ELB will consider the host down.

There are some things you can do to help:

  • Make sure you are using all the cores on each instance. You can use the cluster module, or run several node processes and have Nginx load balance between them using the Nginx upstream module.
  • Use Node.js tools to monitor event loop blocking and found how much the event loop is blocked and what's responsible. Move long running tasks out to separate processes.
  • Strive for a stateless app, avoiding sticky sessions between Nginx and your app and don't enable stickiness in the ELB settings. With sticky sessions, a client can get "stuck" being routed to an unresponsive Node.js process. With a stateles, non-sticky design, the client will be routed to a healthy node process faster.
  • Use Nginx to serve the static assets, not Node.js. This will reduce the load on Node.js, which is likely the bottleneck. In same vein, enable caching at the Nginx level where you can, even brief caches for dynamic responses where it makes sense.

If you continue to feel like like there's a problem with Nginx configuration, an interesting part to post would be the configuration you use to forward traffic from Nginx to Node.js.

You also didn't specify which type of instance you are using for the 6 EC2 instances. Different types have different levels of CPU and network throughput. Depending on your case, you might be better off with fewer beefier boxes, with more cores per box.

Upvotes: 1

Related Questions