Tensorflow Serving number of requests in queue

Question

I have my own TensorFlow serving server for multiple neural networks. Now I want to estimate the load on it. Does somebody know how to get the current number of requests in a queue in TensorFlow serving? I tried using Prometheus, but there is no such option.

喻润洋 · Accepted Answer

Actually ,the tf serving doesn't have requests queue , which means that the tf serving would't rank the requests, if there are too many requests. The only thing that tf serving would do is allocating a threads pool, when the server is initialized.
when a request coming , the tf serving will use a unused thread to deal with the request , if there are no free threads, the tf serving will return a unavailable error.and the client shoule retry again later. you can find the these information in the comments of tensorflow_serving/batching/streaming_batch_schedulor.h

Tensorflow Serving number of requests in queue

Answers (2)

Related Questions