Reputation: 31
I have my own TensorFlow serving server for multiple neural networks. Now I want to estimate the load on it. Does somebody know how to get the current number of requests in a queue in TensorFlow serving? I tried using Prometheus, but there is no such option.
Upvotes: 2
Views: 1556
Reputation: 46
what 's more ,you can assign the number of threads by the --rest_api_num_threads or let it empty and automatically configured by tf serivng
Upvotes: 0
Reputation: 46
Actually ,the tf serving doesn't have requests queue , which means that the tf serving would't rank the requests, if there are too many requests.
The only thing that tf serving would do is allocating a threads pool, when the server is initialized.
when a request coming , the tf serving will use a unused thread to deal with the request , if there are no free threads, the tf serving will return a unavailable error.and the client shoule retry again later.
you can find the these information in the comments of tensorflow_serving/batching/streaming_batch_schedulor.h
Upvotes: 1