Reputation: 5477

How to vectorize a python function that loops over an array of 3D points?

Here is the code:

import numpy as np
from numpy.random import random

@profile
def point_func(point, points, funct):
    return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))

@profile
def point_afunc(ipoints, epoints, funct):
    res = np.zeros(len(ipoints))
    for idx, point in enumerate(ipoints):
        res[idx] = point_func(point, epoints, funct)
    return res

@profile
def main():
    points = random((5000,3))
    rpoint = random((1,3))

    pres = point_func(rpoint, points, lambda r : r**3)

    ares = point_afunc(points, points, lambda r : r**3)

if __name__=="__main__":
    main()

I have profiled it with kernprof and gotten this:

Timer unit: 1e-06 s

Total time: 2.25667 s File: point-array-vectorization.py Function: point_func at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def point_func(point, points, funct):
     6      5001    2256667.0    451.2    100.0      return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))

Total time: 2.27844 s File: point-array-vectorization.py Function: point_afunc at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def point_afunc(ipoints, epoints, funct):
    10         1          5.0      5.0      0.0      res = np.zeros(len(ipoints))
    11      5001       4650.0      0.9      0.2      for idx, point in enumerate(ipoints):
    12      5000    2273789.0    454.8     99.8          res[idx] = point_func(point, epoints, funct)
    13         1          0.0      0.0      0.0      return res

Total time: 2.28239 s File: point-array-vectorization.py Function: main at line 15

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    15                                           @profile
    16                                           def main():
    17         1        145.0    145.0      0.0      points = random((5000,3))
    18         1          2.0      2.0      0.0      rpoint = random((1,3))
    19                                           
    20         1        507.0    507.0      0.0      pres = point_func(rpoint, points, lambda r : r**3)
    21                                           
    22         1    2281731.0 2281731.0    100.0      ares = point_afunc(points, points, lambda r : r**3)

So this part is taking most of the time:

11      5001       4650.0      0.9      0.2      for idx, point in enumerate(ipoints):
    12      5000    2273789.0    454.8     99.8          res[idx] = point_func(point, epoints, funct)

I want to see if the time loss is caused by calling the funct in a for loop. To do this, I would like to vecctorize point_afunc, using numpy.vectorize. I have tried it, but it seems to vectorize away the points: the loop ends up looping over individual point components.

@profile
def point_afunc(ipoints, epoints, funct):
    res = np.zeros(len(ipoints))
    for idx, point in enumerate(ipoints):
        res[idx] = point_func(point, epoints, funct)
    return res

point_afunc = np.vectorize(point_afunc)

Leads to an error:

  File "point-array-vectorization.py", line 24, in main
    ares = point_afunc(points, points, lambda r : r**3)
  File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2755, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2825, in _vectorize_call
    ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
  File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2785, in _get_ufunc_and_otypes
    outputs = func(*inputs)
  File "/usr/lib/python3.6/site-packages/line_profiler.py", line 115, in wrapper
    result = func(*args, **kwds)
  File "point-array-vectorization.py", line 10, in point_afunc
    res = np.zeros(len(ipoints))
TypeError: object of type 'numpy.float64' has no len()

Somehow, instaed of vectorizing each point in ipoints it vectorizes across the points' components?

Edit: tried the advice from @John Zwinck below and used numba. I got longer execution times with @jit than without it. If I remove @profile decorator from all functions, and replace it with @jit for the point_func and point_afunc, these are the execution times:

time ./point_array_vectorization.py 

real    0m3.686s
user    0m3.584s
sys 0m0.077s
point-array-vectorization> time ./point_array_vectorization.py 

real    0m3.683s
user    0m3.596s
sys 0m0.063s
point-array-vectorization> time ./point_array_vectorization.py 

real    0m3.751s
user    0m3.658s
sys 0m0.070s

and with all @jit decorators removed:

point-array-vectorization> time ./point_array_vectorization.py 

real    0m2.925s
user    0m2.874s
sys 0m0.030s
point-array-vectorization> time ./point_array_vectorization.py 

real    0m2.950s
user    0m2.902s
sys 0m0.029s
point-array-vectorization> time ./point_array_vectorization.py 

real    0m2.951s
user    0m2.886s
sys 0m0.042s

Do I need to help the numba compiler more?

Edit: Can point_afunc be written without the for loop using numpy somehow?

Edit: compared the loop version with the numpy broadcasting version by Peter, and the loop version is faster:

Timer unit: 1e-06 s

Total time: 2.13361 s
File: point_array_vectorization.py
Function: point_func at line 7

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           @profile
     8                                           def point_func(point, points, funct):
     9      5001    2133615.0    426.6    100.0      return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))

Total time: 2.1528 s
File: point_array_vectorization.py
Function: point_afunc at line 11

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    11                                           @profile
    12                                           def point_afunc(ipoints, epoints, funct):
    13         1          5.0      5.0      0.0      res = np.zeros(len(ipoints))
    14      5001       4176.0      0.8      0.2      for idx, point in enumerate(ipoints):
    15      5000    2148617.0    429.7     99.8          res[idx] = point_func(point, epoints, funct)
    16         1          0.0      0.0      0.0      return res

Total time: 2.75093 s
File: point_array_vectorization.py
Function: new_point_afunc at line 18

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    18                                           @profile
    19                                           def new_point_afunc(ipoints, epoints, funct):
    20         1    2750926.0 2750926.0    100.0      return np.sum(funct(np.sqrt((ipoints[:, None, :] - epoints[None, :, :])**2).sum(axis=-1)), axis=1)

Total time: 4.90756 s
File: point_array_vectorization.py
Function: main at line 22

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    22                                           @profile
    23                                           def main():
    24         1        170.0    170.0      0.0      points = random((5000,3))
    25         1          4.0      4.0      0.0      rpoint = random((1,3))
    26         1        546.0    546.0      0.0      pres = point_func(rpoint, points, lambda r : r**3)
    27         1    2155829.0 2155829.0     43.9      ares = point_afunc(points, points, lambda r : r**3)
    28         1    2750945.0 2750945.0     56.1      vares = new_point_afunc(points, points, lambda r : r**3)
    29         1         71.0     71.0      0.0      assert(np.max(np.abs(ares-vares)) < 1e-15)

Upvotes: 4

Answers (3)

max9111

Reputation: 6482

Example using Numba

The performance of this approach depends on how often somebody want to call the created functions and on how large the input data is. With a compilation overhead of about 1.67s it is not really suitable to use this approach with relatively small data or calling the function only once.

I have also used your code with minor modifications. Using Numba writing plain loops intead multiple vectorized commands like np.sum(funct(np.sqrt(((point - points)**2)).sum(1))) will be faster, both in runtime and compilation time. Additionally this problem can be easily parallelized, but this would additionally increase the compilation.

Example

import numpy as np
from numpy.random import random
import numba as nb
import time

def make_point_func(funct):
    @nb.njit(fastmath=True)
    def point_func(point, points):
        return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
    return point_func

def make_point_afunc(funct,point_func):
    @nb.njit(fastmath=True)
    def point_afunc(ipoints, epoints):
        res = np.zeros(len(ipoints))
        for idx in range(len(ipoints)):
            res[idx] = point_func(ipoints[idx], epoints)
        return res
    return point_afunc


def main():
  points = random((5000,3))
  rpoint = random((1,3))

  #Make functions
  point_func=make_point_func(nb.njit(lambda r : r**3))
  point_afunc=make_point_afunc(nb.njit(lambda r : r**3),point_func)

  #first call
  print("First call with compilation overhead")
  t1=time.time()
  pres = point_func(rpoint, points)
  ares = point_afunc(points, points)
  print(time.time()-t1)

  print("Second call without compilation overhead")
  t1=time.time()
  #second call
  pres = point_func(rpoint, points)
  ares = point_afunc(points, points)
  print(time.time()-t1)

if __name__=="__main__":
  main()

Performance

original: 1.7s
Numba first call: 1.87s
Numba further calls: 0.2s

Upvotes: 1

Peter

Reputation: 13485

You can use broadcasting for this. Broadcasting is the reshaping of the point matrix so that dimensions "broadcast" against eachother. e.g. ipoints[:, None, :] - epoints[None, :, :] says:

reshape ipoints from MxD to Mx1xD
reshape epoints from NxD to 1xNxD
subtract each pair of points to get a MxNxD array

The full code becomes:

def new_point_afunc(ipoints, epoints, funct):
    return np.sum(funct(np.sqrt((ipoints[:, None, :] - epoints[None, :, :])**2).sum(axis=-1)), axis=1)

Warning - in your example the dimensionality of a point is only 3, but for higher dimensions this may not be practical memory-wise, because this ipoints[:, None, :] - epoints[None, :, :] approach creates an intermediate matrix with shape (len(ipoints), len(epoints), n_dim).

Upvotes: 1

John Zwinck

Reputation: 249293

numpy.vectorize() does nothing useful in terms of performance: it is just syntactic sugar (or rather, syntactic cyanide) which builds a hidden Python for loop. It won't help you.

One thing that might help you quite a bit is Numba. It can just-in-time compile your original code and will probably speed it up a lot. Just replace your @profile decorators with @numba.jit.

Upvotes: 3

How to vectorize a python function that loops over an array of 3D points?

Answers (3)

Example using Numba

Related Questions