Sv443
Sv443

Reputation: 749

Trace a request going through the clearnet / Cloudflare / Apache to precisely find out performance issues

I am hosting a RESTful API and my problem is that every first inbound request after a certain time will take about three seconds, compared to the normal ~100ms.
What I find most interesting is that it is always takes exactly 3100 to around 3250 milliseconds, not more and not less. So it seems pretty intentional to me.

I've already debugged the API and everything runs pretty much instantly except for one thing and that is this three second delay before my API even starts to receive the request.
My best guess is that something went wrong either in Apache or the DNS resolution but I don't know what exactly causes it (that's why I'm asking this question).

I am using the Apache ProxyPass like this:

ProxyRequests off
Timeout 54
ProxyTimeout 5400
ProxyPass /jokeapi http://localhost:8079
ProxyPassReverse /jokeapi http://localhost:8079

I'm using the Cloudflare/APNIC DNS gateway servers 1.1.1.1 and 0.0.0.0
Additionally, all my requests get routed through a Cloudflare SSL proxy before even reaching my network.

I've even partially rewritten the API so it responds with ReadStreams instead of loading the files into RAM and serving it at once but that didn't fix the problem.

My question is how I can fully debug the route a request takes and see precisely where this 3 second delay comes from.

Thanks!

PS: the server runs on NodeJS

Upvotes: 4

Views: 89

Answers (1)

James Pulley
James Pulley

Reputation: 5692

I think the key is not related to network activity, but in the note that after a period of idle activity the first response to the API in a while requires slightly over 3 seconds. I am assuming that follow up actions are back to the 100ms window.

As you are using localhost, this is not a routing issue. If you want, you can just as easily use loopback, 127.0.0.1, to avoid a name resolution hit, but such a hit on a reserved hostname would be microseconds.

I suspect that the compiled version of your RESTful function has aged out of the cache for your system. The first hit after a period of non-use time then requires a recompile, and so long as the compiled instructions are exercised for a period of time they will remain in cache and contoninue to respond in the 100ms range. We observe this condition quite often in multiuser performance testing after cold boots of systems (setting initial conditions). Ramp-ups of the test users take the hit for the recompiles of common code before hitting the time under full load.

Another item to strike back at the network side of the house, DNS timeouts and bind cache entries tend to be quite long, usually significant portions of a day or even longer. Even so, the odds that a DNS lookup for an item which has aged out of the bind cache would not add three seconds to your initial connection time.

Upvotes: 1

Related Questions