How to tell when flask server using waitress is overloaded

Question

I have a simple flask application that runs a machine learning model on data sent in a post request to an endpoint (say /predict). The flask app uses waitress in production with the default parameters. Since prediction can take a while I have a readiness endpoint in my application that I would like to reply with a not ready 50x status code when the waitress task queue is greater than some number (lets say 5).

I need to know how to get the size of the waitress's task queue. Waitress does log "Task queue depth is 94" to stdout but I can't find a way to access the value programmatically. I would then use that number to decide whether my server was ready to respond to more requests or if I need to spin up new instances.

How to tell when flask server using waitress is overloaded

Answers (1)

Related Questions