AlexSomov
AlexSomov

Reputation: 31

Tensorflow Serving number of requests in queue

I have my own TensorFlow serving server for multiple neural networks. Now I want to estimate the load on it. Does somebody know how to get the current number of requests in a queue in TensorFlow serving? I tried using Prometheus, but there is no such option.

Upvotes: 2

Views: 1556

Answers (2)

喻润洋
喻润洋

Reputation: 46

what 's more ,you can assign the number of threads by the --rest_api_num_threads or let it empty and automatically configured by tf serivng

Upvotes: 0

喻润洋
喻润洋

Reputation: 46

Actually ,the tf serving doesn't have requests queue , which means that the tf serving would't rank the requests, if there are too many requests. The only thing that tf serving would do is allocating a threads pool, when the server is initialized.
when a request coming , the tf serving will use a unused thread to deal with the request , if there are no free threads, the tf serving will return a unavailable error.and the client shoule retry again later. you can find the these information in the comments of tensorflow_serving/batching/streaming_batch_schedulor.h

Upvotes: 1

Related Questions