Good Eats
Good Eats

Reputation: 333

How to debug long wait times for Django Website

I have a Django website whose response time I would like to improve. When I click intra-site links on my site the result is either an immediate loading of the next page or a 20-30 second wait before the page loads. I find no reproducible patterns in this behavior to help me identify a fix. I realize that there are many, many reasons why this might be the case and much more information on my specific configuration would be required for specific help in this area.

However, instead of dumping pages of config info and asking for specific suggestions, I hope others can provide suggestions as to general areas I should investigate that would be consistent with the following observation:

Debug-Toolbar shows that total CPU time & SQL query times are in a reasonable range (< 1 sec), however the total browser request time is 22 seconds (see image). Why might these values be so different? What might account for several seconds of request time that wouldn't also fall under CPU-time?

enter image description here


UPDATE 1: I was able to see a more verbose log from Django after modifying the --access-logfile flag in gunicorn (as described here: http://docs.gunicorn.org/en/stable/settings.html#logging). This showed me that in the cases where the lag takes place, Django does not receive the request until the gunicorn worker restarts (which takes about 30 seconds):

web_1 | 2019-07-23 15:33:06 +0000 [CRITICAL] WORKER TIMEOUT (pid:9)
web_1 | [2019-07-23 11:33:06 -0400] [9] [INFO] Worker exiting (pid: 9)
web_1 | [2019-07-23 15:33:06 +0000] [10] [INFO] Booting worker with pid: 10

Now I just need to track down why my gunicorn workers are malfunctioning.


UPDATE 2: I added the -w 4 flag to gunicorn (this flag was not previously specified) and the problem seems to have vanished. I'll keep testing and see if this is a long-term fix.

Upvotes: 2

Views: 1883

Answers (1)

Good Eats
Good Eats

Reputation: 333

Okay, here is the answer I found:

The Debug toolbar's CPU time only reflects the time taken within the Django code. The fact that the total request time was so much longer than the CPU time reflects time taken by a combination of other, server-side, non-Django code. Thus the fix lies not in django, but outside in the rest of the server setup. The usual suspects for typical django deployments is everything in front of Django itself (e.g. ngix, gunicorn, etc.).

Learning more about gunicorn flags (specifically --access-logfile) allowed me to see the error messages produced by a gunicorn worker that was repeatedly (but not reproducibly) timing out. While it is still unknown why the timeout was occurring, changing from one to 4 workers (using -w 4 flag) has solved the original issue of 30 second page load delays.

Upvotes: 1

Related Questions