Reputation: 101
So I know this has something to do with when tensorflow builds the graph and it doesn't do it well... "efficiently". Here's the dummy code I'm running:
@tf.function
def parTest(x_in):
res = 0
for i in range(5000):
res += x_in + i
return res
running that function without using tensorflow takes 0.002 seconds, however running the function using tensorflow takes between 10 to 20 seconds. This makes no sense to me, what's going on here? Also, how do I fix it? The actual value of res here can obviously be calculated in a more efficient way, but the real problem I'm having is that I have a for loop where each iteration has lots of iterations which can be run independently of each other, but tensorflow refuses to do this and runs them really slow one by one, just like this dummy example. So how do I tell tensorflow not to do this?
Upvotes: 3
Views: 2357
Reputation: 59731
Loops are never very efficient in TensorFlow. However, this function is particularly bad for TensorFlow, because it will try to "unroll" the whole loop statically. That is, it will not "translate" your function into a tf.while_loop
, but instead will literally create 5000 copies of the operations in each iteration. That is a very big graph, which on top of that will always run sequentially. I actually get a warning about this in TensorFlow 2.2.0, which points you to this information page: "WARNING: Large unrolled loop detected".
As mentioned in that link, the problem is that TensorFlow cannot (at least at the moment) detect loops over arbitrary iterators, not even if they are a simple range
, so it just runs the loop in Python and creates the corresponding operations. You can avoid that either by writing the tf.while_loop
yourself or, thanks to AutoGraph, simply by replacing your range
with a tf.range
:
import tensorflow as tf
@tf.function
def parTest(x_in):
res = 0
for i in tf.range(5000):
res += x_in + i
return res
Still, writing your own tf.while_loop
(whenever absolutely necessary, as vectorized operations will always be faster) gives you more explicit control over details like the parallel_iterations
parameter.
Upvotes: 2