kking
kking

Reputation: 89

How to correctly convert numpy vectorize to numba vectorize

I'm trying to convert my program to Numba, but I have a problem nesting one function inside another. My methods are based on NumPy vectorize, but I can't do the same using numba. Do you know any analogous examples that I could follow?

This is my program:



import numpy as np
import scipy
import functools
from multiprocessing import Pool
import lib as mlib
from tqdm import tqdm

class vectorize(np.vectorize):
    def __get__(self, obj, objtype):
        return functools.partial(self.__call__, obj)

class stability:
    def __init__(self):

        self.r1 = 20
        self.r2 = 50
        self.rz= 20

        self.zeta = self.r2/self.r1
        self.beta = self.rz/self.r1
        self.Ms = 0.956e6
        self.delta = 1

    @vectorize
    def f(self,ro,rs):
        # print("delta=",self.delta)
        return rs/ro*np.exp( (-1/self.delta)*(ro-rs))

    @vectorize
    def mz(self,ro,rs):
        return ( 1-self.f(ro,rs)**2 ) / ( 1+self.f(ro,rs)**2 )

    @vectorize
    def mro(self,ro,rs):
        return ( 2*self.f(ro,rs) ) / ( 1+self.f(ro,rs)**2 )

    @vectorize
    def E1m(self,a, b, N,rs,d):     
        r = np.linspace(a+(b-a)/(2*N), b-(b-a)/(2*N), N)
        fx = r* ((1/self.delta+1/r)**2 * self.mro(r,rs)**2 + (1/r**2 + 1)*self.mro(r,rs)**2+d*(-(1/self.delta + 1/r) * self.mro(r,rs) + 1/r * self.mro(r,rs)*self.mz(r,rs) ))
        area = np.sum(fx)*(b-a)/N
        return area

if __name__ == "__main__":
    rs = np.arange(0,100,1)
    model = stability()
    print(model.E1m(0,20,300,rs,2))

Upvotes: 5

Views: 3518

Answers (2)

MSeifert
MSeifert

Reputation: 152677

Most of the built-in NumPy functions are already vectorized and don't need the np.vectorize decorator at all. In general the numpy.vectorize decorator will produce very slow results (compared to NumPy)! As the documentation mentions in the Notes section:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

You can improve your codes efficiency immensely by removing the decorator from f, mz, and mro. It will give the same results but run much faster (your code 10.4 seconds, changed code 0.014 seconds).

The E1m function could also be improved (in terms of performance) by using broadcasting instead of vectorize.


However since your question was about how to use numba.vectorize on these functions I have some bad news: It's not possible to use numba.vectorize on instance methods - because numba needs type information and these are not available for custom Python classes.

In general it's also best with numba to start with a pure-loop code on NumPy arrays (no vectorize) and then use the numba njit decorator (or jit(nopython=True). That won't work on methods too but it's much easier to pass in scalar arguments and only iterate over the required arrays.


But if you really want to use the vectorize approach, here's how you would do it for f:

  • You cannot use an instance method because of the self, so you need a static method or a standalone function. Because you don't have access to self you either need to pass in delta or make it global. I decided to make it an argument:
def f(ro, rs, delta):
    return rs / ro * np.exp((-1 / delta) * (ro - rs))
  • Then you need to find out the types that your arguments are (or what types you want to support) and what is returned for the signature. Your ro is an integer array, rs is a float array and delta is an integer, so the signature looks like this (the syntax is return_type(argument_1_type, argument_2_type, ....)):
@nb.vectorize('f8(i8, f8, f8)')
def f(ro, rs, delta):
    return rs / ro * np.exp((-1 / delta) * (ro - rs))

And that's basically it.

For mz and mro you can do the same (remember that you also need the delta there):

@nb.vectorize('f8(i8, f8, f8)')
def mz(ro, rs, delta):
    return (1 - f(ro, rs, delta)**2) / (1 + f(ro, rs, delta)**2)

@nb.vectorize('f8(i8, f8, f8)')
def mro(ro, rs, delta):
    return (2 * f(ro, rs, delta) ) / (1 + f(ro, rs, delta)**2)

Converting the E1m function seems to be a bit trickier (I haven't attempted it) and I leave it as an exercise for the reader.


In case you're interested how I would solve it without vectorize:

import numpy as np
import numba as nb

@nb.njit
def f(ro, rs, delta):
    return rs / ro * np.exp((-1 / delta) * (ro - rs))

@nb.njit
def mz(ro, rs, delta):
    f_2 = f(ro, rs, delta) ** 2
    return (1 - f_2) / (1 + f_2)

@nb.njit
def mro(ro, rs, delta):
    f_ = f(ro, rs, delta)
    return (2 * f_ ) / (1 + f_**2)

@nb.njit(parallel=True)
def E1m(a, b, N, rs, d):
    delta = 1
    r = np.linspace(a + (b - a) / (2 * N), b - (b - a) / (2 * N), N)
    result = np.empty(rs.size)
    for idx in nb.prange(rs.size):
        rs_item = rs[idx]
        sum_ = 0.
        for r_item in r:
            mro_ = mro(r_item, rs_item, delta)
            sum_ += r_item * ((1 / delta + 1 / r_item)**2 * mro_**2  
                              + (1 / r_item**2 + 1) * mro_**2  
                              + d * (-(1 / delta + 1 / r_item) * mro_ 
                                     + 1 / r_item * mro_ * mz(r_item, rs_item, delta)))
        result[idx] = sum_ * (b - a) / N
    return result

There's probably still a bit that could be optimized by loop-lifting or smarter approaches with the calculations but on my computer it's already pretty fast: ~100 microseconds compared to the 14 milliseconds from above, so again 100 times faster.

Upvotes: 5

kking
kking

Reputation: 89

MSeifert Thank You! Right now I have 40 times faster solution.

@numba.vectorize('f8(f8, f8, f8)')
def f(ro, rs, delta):
    return rs / ro * np.exp((-1 / delta) * (ro - rs))

@numba.vectorize('f8(f8, f8, f8)')
def mz(ro, rs, delta):
    return (1 - f(ro, rs, delta)**2) / (1 + f(ro, rs, delta)**2)

@numba.vectorize('f8(f8, f8, f8)')
def mro(ro, rs, delta):
    return (2 * f(ro, rs, delta) ) / (1 + f(ro, rs, delta)**2)

@numba.vectorize(nopython=True)
def E1m(a, b, N,rs,d):   
    r = np.linspace(a+(b-a)/(2*N), b-(b-a)/(2*N), N)
    fx = r* ((1/delta+1/r)**2 * mro(r,rs,delta)**2 + (1/r**2 + 1)*mro(r,rs,delta)**2+d*(-(1/delta + 1/r) * mro(r,rs,delta) + 1/r * mro(r,rs,delta)*mz(r,rs,delta) ))
    area = np.sum(fx)*(b-a)/N
    return area

x=np.arange(0,100,1)
%timeit E1m(0,20,300,x,2)

571 µs ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 0

Related Questions