Reputation: 13088
Let's say I have two functions
def my_sub1(a):
return a + 2
def my_main(a):
a += 1
b = mysub1(a)
return b
and I want to make them faster using a just-in-time compiler like Numba. Is this going to be slower than if I refactor everything into one function
def my_main(a):
a += 1
b = a + 2
return b
because Numba can to deeper optimizations in the second case? Of course my real functions are quite a bit more complex.
Also this whole situation get more difficult if a my_sub1
function get's called more than once - refactoring (and maintaining would become a drag). How does Numba solve this issue?
Upvotes: 2
Views: 899
Reputation: 50318
Tl;dr: Numba is able to inline other Numba functions and it performs relatively advanced inter-procedural optimizations only when using native types (both functions are equally fast in this case), but not with Numpy arrays.
You can analyze the resulting assembly code produced by Numba to check how the two functions are optimized. Here is an example with an integer:
import numba as nb
@nb.njit('int64(int64)')
def my_sub1(a):
return a + 2
@nb.njit('int64(int64)')
def my_main(a):
a += 1
b = my_sub1(a)
return b
open('my_sub1.asm', 'w').write(list(my_sub1.inspect_asm().values())[0])
open('my_main.asm', 'w').write(list(my_main.inspect_asm().values())[0])
This produces two assembly files. If you compare the two file, you will see that the only actual difference (beside the different names) is that the first do addq $2, %rdx
while the second do addq $3, %rdx
. This means that Numba succeed to inline the call to my_sub1
in my_main
and merge the summations. Here is the important part of the assembly code:
_ZN8__main__12my_sub1$2413Ex:
addq $2, %rdx
movq %rdx, (%rdi)
xorl %eax, %eax
retq
_ZN8__main__12my_main$2414Ex:
addq $3, %rdx
movq %rdx, (%rdi)
xorl %eax, %eax
retq
With 64-bit floats, the result is the same as long as you use fastmath=True
since the floating-point addition is not associative.
Regarding Numpy arrays, the generated code gets huge and this is very difficult to compare the two codes. However, the my_sub1
function does not seems inlined anymore and Numba does not seem able to merge the Numpy computation (two distinct vectorized loops for the two array summation are present in the generated code). Note that this is similar to what many C/C++ compiler does. As a result, it is probably better to inline functions yourself in performance-critical part of your code.
Upvotes: 2