Reputation: 1428
Consider an example:
import numpy as np
x = np.ones((100, 100))
y = np.ones((100, 100))
z = np.ones((100, 100))
f = x * y + y * z
f
, how does numpy proceed the execution and menage memory for storing the intermediate results?f
to f = (x + z) * y
?f = (x + z) * y
, does numpy allocate any temp memory beside the memory for f
?Or consider another example:
f = a + b + c # all of them of same dimension
[edit] Following @Daniel 's answer, a similar benchmark shows that numpy allocates memory twice for this expression.
It would be helpful if you could point me to some docs or related compiler techniques. Thanks!
Upvotes: 0
Views: 112
Reputation: 2196
No, numpy
does not do any such optimization.
How should it? Numpy implements n-dimensional array objects in a c-extension. The rest is pure python. numpy
never gets to see the actual expression you are trying to evaluate. It performs operations one at a time as given by the evaluation order (see python documentation) order. Therefore, for each intermediate, numpy
will have to allocate temporary memory. Therefore:
f = (x*y) + (y*z)
(braces for clarity) gets three memory allocations, the last one being used for f
. No expression rewriting is done (that would even be dangerous, as it could change the rounding effects).f = (x+z)*f
: two allocations, the last again gets bound to f
.f = (x + y) + z
: also two allocations.It would of course not be completely impossible for numpy
to learn what expression you are evaluating, but without the help of the interpreter, all of them would be dirty tricks, bound to confuse users. numpy
doesn't do any of that.
Upvotes: 1
Reputation: 34187
It seems numpy doesn't do any optimization but, perhaps more surprisingly, neither does Theano.
Here's a script to compare the two variations for each implementation. Output below.
import timeit
import numpy
import theano
import theano.tensor as tt
def main():
size = 1000
iterations = 1000
a = tt.matrix()
b = tt.matrix()
c = tt.matrix()
f1 = theano.function(inputs=[a, b, c], outputs=a * b + b * c)
f2 = theano.function(inputs=[a, b, c], outputs=(a + c) * b)
theano.printing.debugprint(f1)
theano.printing.debugprint(f2)
x = numpy.ones((size, size))
y = numpy.ones((size, size))
z = numpy.ones((size, size))
result = x * y + y * z
start = timeit.default_timer()
for _ in xrange(iterations):
result1 = x * y + y * z
assert numpy.all(result1 == result)
print timeit.default_timer() - start
start = timeit.default_timer()
for _ in xrange(iterations):
result2 = (x + z) * y
assert numpy.all(result2 == result)
print timeit.default_timer() - start
start = timeit.default_timer()
for _ in xrange(iterations):
result3 = f1(x, y, z)
assert numpy.all(result3 == result)
print timeit.default_timer() - start
start = timeit.default_timer()
for _ in xrange(iterations):
result4 = f2(x, y, z)
assert numpy.all(result4 == result)
print timeit.default_timer() - start
main()
I get the following output:
Elemwise{Composite{((i0 * i1) + (i1 * i2))}} [@A] '' 0
|<TensorType(float64, matrix)> [@B]
|<TensorType(float64, matrix)> [@C]
|<TensorType(float64, matrix)> [@D]
Elemwise{Composite{((i0 + i1) * i2)}} [@A] '' 0
|<TensorType(float64, matrix)> [@B]
|<TensorType(float64, matrix)> [@C]
|<TensorType(float64, matrix)> [@D]
9.19932313948
6.43367212255
4.15276831469
4.07725744595
So the manually optimized version is faster for both numpy and theano but the difference with Theano is smaller. The Theano computation graphs that are printing out show that Theano's optimizing compiler hasn't automatically improved the computation. Theano's version is faster overall though.
Note that these results can differ depending on the size of the matrices being operated on.
Upvotes: 1