Isaac
Isaac

Reputation: 175

Tensorflow prediction speed increases with same image

I'm using tensorflow-gpu 2.5.0 for an object detection model. In the model, I'm calling the prediction function using model(x). When I run the testing in batch, I found that repeated calling the prediction on a same image will result in a faster inference. This gives an inaccurate inference time estimation for my model. What is the reason behind the speed improvement?

Example below is the inference time result from a repeated inference on three different images. It was executed in a loop where the format is <image number> - <inference time>. The inference speed become faster after the model seen image 1 and image 2 once. When I add a new image 3, the first time the model took longer to predict and subsequently become faster as well.

1 - 3.5671939849853516 seconds
2 - 1.1461808681488037 seconds
1 - 0.07942032814025879 seconds
2 - 0.08655834197998047 seconds
1 - 0.0813601016998291 seconds
2 - 0.08380460739135742 seconds
1 - 0.07466459274291992 seconds
2 - 0.08526778221130371 seconds
3 - 1.0617506504058838 seconds
3 - 0.07965445518493652 seconds

Upvotes: 2

Views: 581

Answers (1)

nessuno
nessuno

Reputation: 27042

Calling model(x) it's the same of calling any function @tf.function decorated with an input x.

I covered this scenarios in a 3 part articles, the most useful for your scope is part 2.

What happens is that at the fist invocation with a certain input type, you are defining the so called "ConcreteFunction" - that's the execution of the python code + the tracing + autograph invocation + creation of the tf.Graph object and subsequent cache of this object.

That's why the first call is so slow.

The other calls are way after because you just lookup with the tensor type in the cache map, you find the correct tf.Graph and run the forward pass using the already created graph.

Upvotes: 3

Related Questions