numba numba.core.errors.TypingError dot(array(int8, 1d, C), array(int8, 1d, C))

Question

I am trying to calculate similarity with numpy functions. My arrays(current_cart and data_matrix) contain only 0 and 1. Therefore, I am using np.int8 as data type. For speed up the calculation I am using numba but I am getting following error.

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Failed in nopython mode pipeline (step: nopython frontend)

My code:

from numba import jit, prange
from numpy.linalg import norm
from numpy import zeros, dot, float64

@jit(nopython=True)
def cosine_similarity(a,b):
    dt = dot(a,b)
    if(abs(dt)<=1e-10):
        return 0
    else:
        return dt/(norm(a)*norm(b))
     
@jit(nopython=True, parallel=True)
def calculate_similarity_parallel(multiple_item, single_item):
    n = multiple_item.shape[0]
    scores = zeros(shape=(n), dtype=float64)
    for i in prange(n):
        scores[i] = cosine_similarity(a=single_item, b=multiple_item[i])
    return scores

scores = calculate_similarity_parallel(
            multiple_item=data_matrix,
            single_item=current_cart
        )

data looks as below

data_matrix = [[1 0 0 ... 0 0 0]
               [0 1 0 ... 0 0 0]
               [0 0 1 ... 0 0 0]
               ...
               [0 0 0 ... 0 0 0]
               [0 0 0 ... 0 0 0]
               [0 0 0 ... 0 0 0]]

current_cart = [1 1 0 ... 0 0 0]

Error as below

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function() found for signature:
 
 >>> dot(array(int8, 1d, C), array(int8, 1d, C))
 
There are 4 candidate implementations:
  - Of which 4 did not match due to:
  Overload in function '_OverloadWrapper._build..ol_generated': File: numba/core/overload_glue.py: Line 131.
    With argument(s): '(array(int8, 1d, C), array(int8, 1d, C))':
   Rejected as the implementation raised a specific error:
     TypingError: Failed in nopython mode pipeline (step: nopython frontend)
   No implementation of function Function() found for signature:
    
    >>> stub(array(int8, 1d, C), array(int8, 1d, C))
    
   There are 2 candidate implementations:
     - Of which 2 did not match due to:
     Intrinsic in function 'stub': File: numba/core/overload_glue.py: Line 35.
       With argument(s): '(array(int8, 1d, C), array(int8, 1d, C))':
      Rejected as the implementation raised a specific error:
        TypingError: np.dot() only supported on float and complex arrays
     raised from /home/ak/Desktop/recommendation/venv/lib/python3.9/site-packages/numba/core/typing/npydecl.py:970
   
   During: resolving callee type: Function()
   During: typing of call at  (3)
   
   
   File "", line 3:
   

  raised from /home/ak/Desktop/recommendation/venv/lib/python3.9/site-packages/numba/core/typeinfer.py:1086

During: resolving callee type: Function()
During: typing of call at /home/ak/Desktop/recommendation/./cf_api/utils/utils.py (7)


File "cf_api/utils/utils.py", line 7:
def cosine_similarity(a,b):
    dt = dot(a,b)
    ^

During: resolving callee type: type(CPUDispatcher())
During: typing of call at /home/ak/Desktop/recommendation/./cf_api/utils/utils.py (18)

During: resolving callee type: type(CPUDispatcher())
During: typing of call at /home/ak/Desktop/recommendation/./cf_api/utils/utils.py (18)


File "cf_api/utils/utils.py", line 18:
def calculate_similarity_parallel(multiple_item, single_item):
    
    for i in prange(n):
        scores[i] = cosine_similarity(a=single_item, b=multiple_item[i])
        ^

Is there any idea how to solve that?

J&#233;r&#244;me Richard · Accepted Answer

What the error means is simply that dot is not implemented for the int8 type (the same applies for norm by the way). Thus, you need to reimplement it. This is unfortunately quite common with Numba. That being said, this is nor a big deal here since it can be easily implemented using a basic loop. This is not so bad since it make your think about the type of the accumulator to choose. Indeed, Numpy uses the one of the array by default (ie. int8) which certainly causes some sneaky hidden overflow when calling dot. The best type to choose is very dependent of the size of the arrays (which is not provided) and the input values. It also impacts performance (smaller types are generally faster in such case due to the potential use of SIMD instructions). Additionally, there is no need for an actual multiplication since the input contains binary values so the dot can be optimized by using logical ANDs. Moreover, note that abs(dt)<=1e-10 does not make much sense since the output must be an integer. Finally, the norm can also be optimized since the square of binary value is the identity function and so there is no need to actually square the values.

import numba as nb
import numpy as np

@nb.njit('float64(int8[::1], int8[::1])')
def cosine_similarity(a,b):
    # Large safe integer type (int16 is less safe but certainly faster)
    dt = np.int32(0)
    for i in range(a.size):
        dt += a[i] & b[i]
    
    if dt == 0:
        return 0.0

    sa, sb = np.int32(0), np.int32(0)
    for i in range(a.size):
        sa += a[i]
        sb += b[i]
    return dt / np.sqrt(sa * sb)

@nb.njit('float64[:](int8[:,::1], int8[::1])', parallel=True)
def calculate_similarity_parallel(multiple_item, single_item):
    n = multiple_item.shape[0]
    scores = np.zeros(n, np.float64)
    for i in nb.prange(n):
        scores[i] = cosine_similarity(single_item, multiple_item[i])
    return scores

scores = calculate_similarity_parallel(
            multiple_item=data_matrix,
            single_item=current_cart
         )

This is several hundred times faster than the initial code with Numba disabled on my machine. Note that sa is recomputed for each line of data_matrix which may be slower than if it would be computed once.

numba numba.core.errors.TypingError >> dot(array(int8, 1d, C), array(int8, 1d, C))

Answers (1)

Related Questions

numba numba.core.errors.TypingError &gt;&gt; dot(array(int8, 1d, C), array(int8, 1d, C))

Answers (1)

Related Questions

numba numba.core.errors.TypingError >> dot(array(int8, 1d, C), array(int8, 1d, C))