Yixing Lao
Yixing Lao

Reputation: 1428

How do numpy manage memory when evaluating an expression?

Consider an example:

import numpy as np
x = np.ones((100, 100))
y = np.ones((100, 100))
z = np.ones((100, 100))
f = x * y + y * z

Or consider another example:

f = a + b + c # all of them of same dimension

[edit] Following @Daniel 's answer, a similar benchmark shows that numpy allocates memory twice for this expression.

It would be helpful if you could point me to some docs or related compiler techniques. Thanks!

Upvotes: 0

Views: 112

Answers (2)

burnpanck
burnpanck

Reputation: 2196

No, numpy does not do any such optimization.

How should it? Numpy implements n-dimensional array objects in a c-extension. The rest is pure python. numpy never gets to see the actual expression you are trying to evaluate. It performs operations one at a time as given by the evaluation order (see python documentation) order. Therefore, for each intermediate, numpy will have to allocate temporary memory. Therefore:

  • f = (x*y) + (y*z) (braces for clarity) gets three memory allocations, the last one being used for f. No expression rewriting is done (that would even be dangerous, as it could change the rounding effects).
  • f = (x+z)*f: two allocations, the last again gets bound to f.
  • f = (x + y) + z: also two allocations.

It would of course not be completely impossible for numpy to learn what expression you are evaluating, but without the help of the interpreter, all of them would be dirty tricks, bound to confuse users. numpy doesn't do any of that.

Upvotes: 1

Daniel Renshaw
Daniel Renshaw

Reputation: 34187

It seems numpy doesn't do any optimization but, perhaps more surprisingly, neither does Theano.

Here's a script to compare the two variations for each implementation. Output below.

import timeit
import numpy
import theano
import theano.tensor as tt


def main():
    size = 1000
    iterations = 1000

    a = tt.matrix()
    b = tt.matrix()
    c = tt.matrix()
    f1 = theano.function(inputs=[a, b, c], outputs=a * b + b * c)
    f2 = theano.function(inputs=[a, b, c], outputs=(a + c) * b)
    theano.printing.debugprint(f1)
    theano.printing.debugprint(f2)

    x = numpy.ones((size, size))
    y = numpy.ones((size, size))
    z = numpy.ones((size, size))

    result = x * y + y * z

    start = timeit.default_timer()
    for _ in xrange(iterations):
        result1 = x * y + y * z
        assert numpy.all(result1 == result)
    print timeit.default_timer() - start
    start = timeit.default_timer()
    for _ in xrange(iterations):
        result2 = (x + z) * y
        assert numpy.all(result2 == result)
    print timeit.default_timer() - start
    start = timeit.default_timer()
    for _ in xrange(iterations):
        result3 = f1(x, y, z)
        assert numpy.all(result3 == result)
    print timeit.default_timer() - start
    start = timeit.default_timer()
    for _ in xrange(iterations):
        result4 = f2(x, y, z)
        assert numpy.all(result4 == result)
    print timeit.default_timer() - start


main()

I get the following output:

Elemwise{Composite{((i0 * i1) + (i1 * i2))}} [@A] ''   0
 |<TensorType(float64, matrix)> [@B]
 |<TensorType(float64, matrix)> [@C]
 |<TensorType(float64, matrix)> [@D]
Elemwise{Composite{((i0 + i1) * i2)}} [@A] ''   0
 |<TensorType(float64, matrix)> [@B]
 |<TensorType(float64, matrix)> [@C]
 |<TensorType(float64, matrix)> [@D]
9.19932313948
6.43367212255
4.15276831469
4.07725744595

So the manually optimized version is faster for both numpy and theano but the difference with Theano is smaller. The Theano computation graphs that are printing out show that Theano's optimizing compiler hasn't automatically improved the computation. Theano's version is faster overall though.

Note that these results can differ depending on the size of the matrices being operated on.

Upvotes: 1

Related Questions