Rad
Rad

Reputation: 5012

How to keep TCP connections alive on AWS network loadbalancer

Architecture:
We have a bunch of IoT devices connected via an AWS network loadbalancer (NLB) to our backend servers. This is a bidirectional channel (not a request response style, but messages passed from either party to the other).

Objective:
How to keep connections (both sides of NLB) alive during inactivity.

Description: Frequently clients go to inactive mode and do not send (or receive) anything to (or from) servers. If this state lasts longer than 350 seconds (connection idle timeout value of NLBs) the LB silently kill the connection. This is bad, because we see a lot of RST packets everywhere.

Questions:

  1. I'm aware of SO_KEEPALIVE feature and can enable it on our backend servers. This keeps the connection between backend servers and NLB alive. But what about clients? Do NLBs forward TCP keep-alive packets to the other party? (Here it says it does not). If it does not, how to keep clients connections open? (At them moment, I'm thinking to send an empty message to keep the connection.)
  2. Is this behavior specific to AWS NLBs or do loadbalancers generally work this way?

Upvotes: 6

Views: 16558

Answers (2)

Illia Ivanou
Illia Ivanou

Reputation: 41

AWS docs say that NLB TCP listener has ability to keep connection alive with TCP keep-alive packets: link

For TCP listeners, clients or targets can use TCP keepalive packets to reset the idle timeout.

Based on my tests client is receiving TCP keep alive packets sent by server and correctly responds back. Server doesn't interrupt connection what means it receives response from client. It means that NLB TCP listener actually forwards keep-alive packets.

Based on the same docs, NLB TLS listener shouldn't react the same on TCP keep-alive packets.

TCP keepalive packets are not supported for TLS listeners.

But actual tests result shocked me when Wireshark showed keep-alive packets received on client connected through TLS listener. My previous test results performed 2 months ago don't correspond what I'm experiencing now and I'm thinking behaviour may changed. (previously server was keeping the connection even after client became unavailable in unexpected manner)

Upvotes: 4

Rad
Rad

Reputation: 5012

Not an answer, just to document what I found/did:

  1. NELBs do not forward keep-alive packets. Meaning you have to enable them on both server and clients.
  2. NELB's timeout cannot be changed. it's 350 second
  3. I couldn't find any way to forge an empty TCP packet to fool the LB to forward it to the other side of the LB.

At the end, we implemented the keep alive feature at the application layer (sending an empty message to clients periodically.)

Upvotes: 3

Related Questions