Sid M
Sid M

Reputation: 239

Tensorflow Object Detection API has slow inference time with tensorflow serving

I am unable to match the inference times reported by Google for models released in their model zoo. Specifically I am trying out their faster_rcnn_resnet101_coco model where the reported inference time is 106ms on a Titan X GPU.

My serving system is using TF 1.4 running in a container built from the Dockerfile released by Google. My client is modeled after the inception client also released by Google.

I am running on an Ubuntu 14.04, TF 1.4 with 1 Titan X. My total inference time is 3x worse than reported by Google ~330ms. Making the tensor proto is taking ~150ms and Predict is taking ~180ms. My saved_model.pb is directly from the tar file downloaded from the model zoo. Is there something I am missing? What steps can I take to reduce the inference time?

Upvotes: 6

Views: 2754

Answers (4)

Sid M
Sid M

Reputation: 239

I was able to solve the two problems by

  1. optimizing the compiler flags. Added the following to bazel-bin --config=opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma

  2. Not importing tf.contrib for every inference. In the inception_client sample provided by google, these lines re-import tf.contrib for every forward pass.

Upvotes: 3

gustavz
gustavz

Reputation: 3170

@Vikram Gupta did you check your GPU Usage? Does it get somewhere near 80-100%? I experience very low GPU Usage detecting Objects of a Video Stream with the API and models of the "model zoo".

Upvotes: 0

Vikram Gupta
Vikram Gupta

Reputation: 121

I ran similar model with a Titan Xp, however, I user the infer_detections.py script and logged the forward pass time [basically by using datetime before and after tf_example = detection_inference.infer_detections_and_add_to_example( serialized_example_tensor, detected_boxes_tensor, detected_scores_tensor, detected_labels_tensor, FLAGS.discard_image_pixels) I had reduced the # of proposals generated in the first stage of FasterRCN from 300 to 100, and reduced the number of detections at the second stage to 100 as well. I got numbers in the range of 80 to 140 ms, and I think that the 600x600 image would approximately take ~106 or slightly less in this set-up (due to Titan Xp, and reduced complexity of model). Maybe you can repeat the above process on your hardware, that way if the numbers are also ~106 ms for this case, we can attribute the difference to the use of DockerFile and the client. If the numbers are still high, then perhaps it is the hardware.

Would be helpful if someone from Tensorflow Object Detection team can comment on the set up used for generating the numbers in model zoo.

Upvotes: 0

Vikram Gupta
Vikram Gupta

Reputation: 121

Non-max suppression may be the bottleneck: https://github.com/tensorflow/models/issues/2710.

Is the image size 600x600?

Upvotes: 2

Related Questions