Reputation: 5477
Here is the code:
import numpy as np
from numpy.random import random
@profile
def point_func(point, points, funct):
return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
@profile
def point_afunc(ipoints, epoints, funct):
res = np.zeros(len(ipoints))
for idx, point in enumerate(ipoints):
res[idx] = point_func(point, epoints, funct)
return res
@profile
def main():
points = random((5000,3))
rpoint = random((1,3))
pres = point_func(rpoint, points, lambda r : r**3)
ares = point_afunc(points, points, lambda r : r**3)
if __name__=="__main__":
main()
I have profiled it with kernprof
and gotten this:
Timer unit: 1e-06 s
Total time: 2.25667 s File: point-array-vectorization.py Function: point_func at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def point_func(point, points, funct):
6 5001 2256667.0 451.2 100.0 return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
Total time: 2.27844 s File: point-array-vectorization.py Function: point_afunc at line 8
Line # Hits Time Per Hit % Time Line Contents
==============================================================
8 @profile
9 def point_afunc(ipoints, epoints, funct):
10 1 5.0 5.0 0.0 res = np.zeros(len(ipoints))
11 5001 4650.0 0.9 0.2 for idx, point in enumerate(ipoints):
12 5000 2273789.0 454.8 99.8 res[idx] = point_func(point, epoints, funct)
13 1 0.0 0.0 0.0 return res
Total time: 2.28239 s File: point-array-vectorization.py Function: main at line 15
Line # Hits Time Per Hit % Time Line Contents
==============================================================
15 @profile
16 def main():
17 1 145.0 145.0 0.0 points = random((5000,3))
18 1 2.0 2.0 0.0 rpoint = random((1,3))
19
20 1 507.0 507.0 0.0 pres = point_func(rpoint, points, lambda r : r**3)
21
22 1 2281731.0 2281731.0 100.0 ares = point_afunc(points, points, lambda r : r**3)
So this part is taking most of the time:
11 5001 4650.0 0.9 0.2 for idx, point in enumerate(ipoints):
12 5000 2273789.0 454.8 99.8 res[idx] = point_func(point, epoints, funct)
I want to see if the time loss is caused by calling the funct
in a for
loop. To do this, I would like to vecctorize point_afunc
, using numpy.vectorize
. I have tried it, but it seems to vectorize away the points: the loop ends up looping over individual point components.
@profile
def point_afunc(ipoints, epoints, funct):
res = np.zeros(len(ipoints))
for idx, point in enumerate(ipoints):
res[idx] = point_func(point, epoints, funct)
return res
point_afunc = np.vectorize(point_afunc)
Leads to an error:
File "point-array-vectorization.py", line 24, in main
ares = point_afunc(points, points, lambda r : r**3)
File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2755, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2825, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2785, in _get_ufunc_and_otypes
outputs = func(*inputs)
File "/usr/lib/python3.6/site-packages/line_profiler.py", line 115, in wrapper
result = func(*args, **kwds)
File "point-array-vectorization.py", line 10, in point_afunc
res = np.zeros(len(ipoints))
TypeError: object of type 'numpy.float64' has no len()
Somehow, instaed of vectorizing each point in ipoints
it vectorizes across the points' components?
Edit: tried the advice from @John Zwinck below and used numba. I got longer execution times with @jit
than without it. If I remove @profile
decorator from all functions, and replace it with @jit
for the point_func
and point_afunc
, these are the execution times:
time ./point_array_vectorization.py
real 0m3.686s
user 0m3.584s
sys 0m0.077s
point-array-vectorization> time ./point_array_vectorization.py
real 0m3.683s
user 0m3.596s
sys 0m0.063s
point-array-vectorization> time ./point_array_vectorization.py
real 0m3.751s
user 0m3.658s
sys 0m0.070s
and with all @jit
decorators removed:
point-array-vectorization> time ./point_array_vectorization.py
real 0m2.925s
user 0m2.874s
sys 0m0.030s
point-array-vectorization> time ./point_array_vectorization.py
real 0m2.950s
user 0m2.902s
sys 0m0.029s
point-array-vectorization> time ./point_array_vectorization.py
real 0m2.951s
user 0m2.886s
sys 0m0.042s
Do I need to help the numba
compiler more?
Edit: Can point_afunc
be written without the for loop using numpy
somehow?
Edit: compared the loop version with the numpy
broadcasting version by Peter, and the loop version is faster:
Timer unit: 1e-06 s
Total time: 2.13361 s
File: point_array_vectorization.py
Function: point_func at line 7
Line # Hits Time Per Hit % Time Line Contents
==============================================================
7 @profile
8 def point_func(point, points, funct):
9 5001 2133615.0 426.6 100.0 return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
Total time: 2.1528 s
File: point_array_vectorization.py
Function: point_afunc at line 11
Line # Hits Time Per Hit % Time Line Contents
==============================================================
11 @profile
12 def point_afunc(ipoints, epoints, funct):
13 1 5.0 5.0 0.0 res = np.zeros(len(ipoints))
14 5001 4176.0 0.8 0.2 for idx, point in enumerate(ipoints):
15 5000 2148617.0 429.7 99.8 res[idx] = point_func(point, epoints, funct)
16 1 0.0 0.0 0.0 return res
Total time: 2.75093 s
File: point_array_vectorization.py
Function: new_point_afunc at line 18
Line # Hits Time Per Hit % Time Line Contents
==============================================================
18 @profile
19 def new_point_afunc(ipoints, epoints, funct):
20 1 2750926.0 2750926.0 100.0 return np.sum(funct(np.sqrt((ipoints[:, None, :] - epoints[None, :, :])**2).sum(axis=-1)), axis=1)
Total time: 4.90756 s
File: point_array_vectorization.py
Function: main at line 22
Line # Hits Time Per Hit % Time Line Contents
==============================================================
22 @profile
23 def main():
24 1 170.0 170.0 0.0 points = random((5000,3))
25 1 4.0 4.0 0.0 rpoint = random((1,3))
26 1 546.0 546.0 0.0 pres = point_func(rpoint, points, lambda r : r**3)
27 1 2155829.0 2155829.0 43.9 ares = point_afunc(points, points, lambda r : r**3)
28 1 2750945.0 2750945.0 56.1 vares = new_point_afunc(points, points, lambda r : r**3)
29 1 71.0 71.0 0.0 assert(np.max(np.abs(ares-vares)) < 1e-15)
Upvotes: 4
Views: 297
Reputation: 6482
The performance of this approach depends on how often somebody want to call the created functions and on how large the input data is. With a compilation overhead of about 1.67s it is not really suitable to use this approach with relatively small data or calling the function only once.
I have also used your code with minor modifications. Using Numba writing plain loops intead multiple vectorized commands like np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
will be faster, both in runtime and compilation time. Additionally this problem can be easily parallelized, but this would additionally increase the compilation.
Example
import numpy as np
from numpy.random import random
import numba as nb
import time
def make_point_func(funct):
@nb.njit(fastmath=True)
def point_func(point, points):
return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
return point_func
def make_point_afunc(funct,point_func):
@nb.njit(fastmath=True)
def point_afunc(ipoints, epoints):
res = np.zeros(len(ipoints))
for idx in range(len(ipoints)):
res[idx] = point_func(ipoints[idx], epoints)
return res
return point_afunc
def main():
points = random((5000,3))
rpoint = random((1,3))
#Make functions
point_func=make_point_func(nb.njit(lambda r : r**3))
point_afunc=make_point_afunc(nb.njit(lambda r : r**3),point_func)
#first call
print("First call with compilation overhead")
t1=time.time()
pres = point_func(rpoint, points)
ares = point_afunc(points, points)
print(time.time()-t1)
print("Second call without compilation overhead")
t1=time.time()
#second call
pres = point_func(rpoint, points)
ares = point_afunc(points, points)
print(time.time()-t1)
if __name__=="__main__":
main()
Performance
original: 1.7s
Numba first call: 1.87s
Numba further calls: 0.2s
Upvotes: 1
Reputation: 13485
You can use broadcasting for this. Broadcasting is the reshaping of the point matrix so that dimensions "broadcast" against eachother. e.g. ipoints[:, None, :] - epoints[None, :, :]
says:
ipoints
from MxD to Mx1xDepoints
from NxD to 1xNxDThe full code becomes:
def new_point_afunc(ipoints, epoints, funct):
return np.sum(funct(np.sqrt((ipoints[:, None, :] - epoints[None, :, :])**2).sum(axis=-1)), axis=1)
Warning - in your example the dimensionality of a point is only 3, but for higher dimensions this may not be practical memory-wise, because this ipoints[:, None, :] - epoints[None, :, :]
approach creates an intermediate matrix with shape (len(ipoints), len(epoints), n_dim)
.
Upvotes: 1
Reputation: 249293
numpy.vectorize()
does nothing useful in terms of performance: it is just syntactic sugar (or rather, syntactic cyanide) which builds a hidden Python for
loop. It won't help you.
One thing that might help you quite a bit is Numba. It can just-in-time compile your original code and will probably speed it up a lot. Just replace your @profile
decorators with @numba.jit
.
Upvotes: 3