Reputation: 31
So I've got some code
tensors = [] //its filled with 3D float tensors
total = sum(tensors)
if I change that last line to
total = tf.add_n(tensors)
then the code produces the same output but runs much more slowly and soon causes an out-of-memory exception. Whats going on here? Can someone explain how pythons built in sum function and tf.add_n interact with an array of tensors respectively and why pythons sum would seemingly just be a better version?
Upvotes: 3
Views: 2288
Reputation: 24581
When you use sum
, you call a standard python algorithm that call __add__
recursively on the elements of the array. Since __add__
(or +
) indeed is overloaded on tensorflow's tensors, it works as expected: it creates a graph that can be executed during a session. It is not optimal, however, because you add as many operation as there are elements in your list; also, you are enforcing the order of the operation (add the first two elements, then the third to the result, and so on), which is also not optimal.
By contrast, add_n
is a specialized operation to do just that. Looking at the graph is really telling I think:
import tensorflow as tf
with tf.variable_scope('sum'):
xs = [tf.zeros(()) for _ in range(10)]
sum(xs)
with tf.variable_scope('add_n'):
xs = [tf.zeros(()) for _ in range(10)]
tf.add_n(xs)
However – contrary to what I thought earlier – add_n
takes up more memory because it waits – and store – for all incoming inputs before storing them. If the number of inputs is large, then the difference can be substantial.
The behavior I was expecting from add_n
, that is, summation of inputs as they are available, is actually achieved by tf.accumulate_n
. This should be the superior alternative, as it takes less memory than add_n
but does not enforce the order of summation like sum
.
Why did the authors of tensorflow-wavenet used sum
instead of tf.accumulate_n
? Certainly because before this function is not differentiable on TF < 1.7. So if you have to support TF < 1.7 and be memory efficient, good old sum
is actually quite a good option.
Upvotes: 6
Reputation: 21
The sum() built-in only takes iterables and therefor would seem to gain the advantage of using generators in regards to memory profile.
the add_n() function for tensor takes a list of tensors and seem to retain that data structure throughout handling based on it's requirement for shape comparison.
In [29]: y = [1,2,3,4,5,6,7,8,9,10]
In [30]: y.__sizeof__()
Out[30]: 120
In [31]: x = iter(y)
In [32]: x.__sizeof__()
Out[32]: 32
Upvotes: 1