Praveen Gupta Sanka
Praveen Gupta Sanka

Reputation: 631

Declaring Numba Vectorize for returning two variables

To improve the performance of loops, I used Numba vectorize method.

s1 = pd.Series([1,3,5,6,8,10,1,1,1,1,1,1])
s2 = pd.Series([4,5,6,8,10,1,7,1,6,5,4,3])

ding=pd.DataFrame({'A':s1,'B':s2})

@numba.vectorize(['float64(int16,int16)'])
def sumd(a,b):    
    if a==1:
        return (a+b)
    else:
        return 0

ding['sum']=sumd(ding.A,ding.B)

Now I want to return an additional variable that is product of cols A and B. i.e. My aim is to return two variables from a function using vectorize method. I am not sure how to initialize the numba.vectorize method. Please help me. I am open to listen to any other ways to improve the efficiency of the method as well.

One alternative approach I tried is the following, but this appeared a bit complicated to me. I am looking for easier ways to optimize the function. Thanks in advance.

s1 = pd.Series([1,3,5,6,8,10,1,1,1,1,1,1])
s2 = pd.Series([4,5,6,8,10,1,7,1,6,5,4,3])

ding=pd.DataFrame({'A':s1,'B':s2})

@numba.vectorize(['float64(int16,int16)'])
def sumd(a,b):    
    if a==1:
        sumarr.append((a+b))
        prodarr.append(a*b)
        return 1
    else:
        sumarr.append(0)
        prodarr.append(0)
        return 1

sumarr=[]
prodarr=[]
sumd(ding.A,ding.B)
ding['sum']=sumarr
ding['prod']=prodarr

Upvotes: 2

Views: 1072

Answers (2)

cristipurdel
cristipurdel

Reputation: 1

You could try: 1. add an extra variable which should choose between sum and product and basically run your code 2 times, which is helpful for parallel & cuda target

@numba.vectorize(['float64(int16,int16,int16)']) 
if retopt ==1:
    return (a+b)
if retopt ==2:
    return (a*b)
  1. mask you sum and product in the return value e.g. if you know max(abs(s1,s2)) = 37 kbypass = next magnitude (37) = 100

    return = kbypass * product + sum

then do smth like

product, sum= divmod(out, kBypass)

Upvotes: 0

JoshAdel
JoshAdel

Reputation: 68682

You can't return multiple values from vectorize and using global lists is not going to work. I would just use a standard jit function instead:

@nb.jit(nopython=True)
def sumd(a, b):
    sumx = np.zeros_like(a, dtype=np.float64)
    prodx = np.zeros_like(a, dtype=np.float64)

    for i in range(a.shape[0]):
        if a[i] == 1:
            sumx[i] = a[i] + b[i]
            prodx[i] = a[i] * b[i]

    return sumx, prodx

sumx, prodx = sumd(ding.A.values, ding.B.values)
ding['sum'] = sumx
ding['prod'] = prodx

Note, I'm passing in the values of each column so that I can use numba in nopython mode since this is always more efficient.

Upvotes: 4

Related Questions