user2458126
user2458126

Reputation: 31

python sum on array of tensors vs tf.add_n

So I've got some code

tensors = [] //its filled with 3D float tensors
total = sum(tensors)

if I change that last line to

total = tf.add_n(tensors)

then the code produces the same output but runs much more slowly and soon causes an out-of-memory exception. Whats going on here? Can someone explain how pythons built in sum function and tf.add_n interact with an array of tensors respectively and why pythons sum would seemingly just be a better version?

Upvotes: 3

Views: 2288

Answers (2)

P-Gn
P-Gn

Reputation: 24581

When you use sum, you call a standard python algorithm that call __add__ recursively on the elements of the array. Since __add__ (or +) indeed is overloaded on tensorflow's tensors, it works as expected: it creates a graph that can be executed during a session. It is not optimal, however, because you add as many operation as there are elements in your list; also, you are enforcing the order of the operation (add the first two elements, then the third to the result, and so on), which is also not optimal.

By contrast, add_n is a specialized operation to do just that. Looking at the graph is really telling I think:

import tensorflow as tf

with tf.variable_scope('sum'):
  xs = [tf.zeros(()) for _ in range(10)]
  sum(xs)

with tf.variable_scope('add_n'):
  xs = [tf.zeros(()) for _ in range(10)]
  tf.add_n(xs)

enter image description here

However – contrary to what I thought earlier – add_n takes up more memory because it waits – and store – for all incoming inputs before storing them. If the number of inputs is large, then the difference can be substantial.

The behavior I was expecting from add_n, that is, summation of inputs as they are available, is actually achieved by tf.accumulate_n. This should be the superior alternative, as it takes less memory than add_n but does not enforce the order of summation like sum.

Why did the authors of tensorflow-wavenet used sum instead of tf.accumulate_n? Certainly because before this function is not differentiable on TF < 1.7. So if you have to support TF < 1.7 and be memory efficient, good old sum is actually quite a good option.

Upvotes: 6

Ron Lawhorn
Ron Lawhorn

Reputation: 21

The sum() built-in only takes iterables and therefor would seem to gain the advantage of using generators in regards to memory profile.

the add_n() function for tensor takes a list of tensors and seem to retain that data structure throughout handling based on it's requirement for shape comparison.

In [29]: y = [1,2,3,4,5,6,7,8,9,10]  

In [30]: y.__sizeof__()
Out[30]: 120

In [31]: x = iter(y)

In [32]: x.__sizeof__()
Out[32]: 32

Upvotes: 1

Related Questions