Giulio Mattera
Giulio Mattera

Reputation: 1

TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'shape' of type float32

I am learning to use numba to accelerate codes in Python. With this code:

from numba import cuda, vectorize
import numpy as np

@cuda.jit(device = True)
def pixel_count(img1,img2):
    count1 = 0
    count2 = 0
    for i in range(img1.shape[0]):
        for j in range(img1.shape[1]):
            if img1[i][j] > 200:
                count1 = count1 + 1
    i = 0; j = 0;
    for i in range(img2.shape[0]):
        for j in range(img2.shape[1]):
            if img2[i][j] > 200:
                count2 = count2 + 1
                         
    return count1, count2


@vectorize(['float32(float32,float32)'], target = 'cuda')
def cint(img1, img2):
    c1, c2 = pixel_count(img1, img2)
    res = c1-c2
    return res

A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255


res = cint(A,B)

I received the following error:

TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) found for signature: pixel_count (float32, float32) There are 2 candidate implementations: - Of which 2 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-9169f440975d>: Line 4. With argument(s): '(float32, float32)': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'shape' of type float32

 File "<ipython-input-33-9169f440975d>", line 8:
 def pixel_count(img1,img2):
     <source elided>
     count2 = 0
     for i in range(img1.shape[0]):
     ^
 
 During: typing of get attribute at <ipython-input-33-9169f440975d> (8)
 
 File "<ipython-input-33-9169f440975d>", line 8:
 def pixel_count(img1,img2):
     <source elided>
     count2 = 0
     for i in range(img1.shape[0]):
     ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) During: typing of call at (3)

EDIT

I changed the code like this using guvectorize:

@guvectorize(['(float32[:],float32[:], float32)'], '(), () -> ()',target = 'cuda')
def cint(img1, img2, res):
    c1, c2 = pixel_count(img1, img2)
    res = c1-c2


A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255


res = cint(A, B)

With this error:

TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) found for signature: pixel_count (array(float32, 1d, A), array(float32, 1d, A)) There are 2 candidate implementations:

  • Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
  1. With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD239D0>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

  • Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
  1. With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD52370>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) During: typing of call at (23)

How i can use cuda.jit and vectorize/guvectorize function?

EDIT 2

Thank you all for the responses. The goal was to figure out how to solve this task with GPU using numba. Probably the code is faster in CPU being the matrices small size; thank you for the tips on parallel computing very helpful. Do you have any other suggestions on how to port this code to GPU? Thank you very much.

I have modified the code in this way but it always returns the value 0:

from numba import cuda, vectorize, guvectorize
import numpy as np


@cuda.jit(device = True)
def pixel_count(img1,img2):
    count1 = 0
    count2 = 0
    for i in range(img1.shape[0]):
        for j in range(img1.shape[1]):
            if img1[i][j] > 200:
                count1 = count1 + 1
    i = 0; j = 0;
    for i in range(img2.shape[0]):
        for j in range(img2.shape[1]):
            if img2[i][j] > 200:
                count2 = count2 + 1
                         
    return count1, count2

@guvectorize(['(float32[:,:],float32[:,:], int16)'],
             '(n,m), (n,m)-> ()', target = 'cuda')
def cint(img1, img2, res):
    count1, count2 = pixel_count(img1, img2)
    res = count1 - count2

A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res1 = cint(A, B)

Upvotes: 0

Views: 5087

Answers (1)

aerobiomat
aerobiomat

Reputation: 3437

Not using CUDA, but this may give you some ideas:

Pure Numpy (already vectorized):

A = np.random.rand(480, 640).astype(np.float32) * 255
B = np.random.rand(480, 640).astype(np.float32) * 255

%timeit (A > 200).sum() - (B > 200).sum()
478 µs ± 4.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Simply wrapping the numpy operations in a JITted function:

@nb.njit
def pixel_count_jit(img):
    return (img > 200).sum()

%timeit pixel_count_jit(A) - pixel_count_jit(B)
165 µs ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Parallelizing with Numba by rows:

@nb.njit(parallel=True)
def pixel_count_parallel(img):
    counts = np.empty(img.shape[1], dtype=nb.uint32)
    for i in nb.prange(img.shape[0]):
        counts[i] = (img[i] > 200).sum()
    return counts.sum()

%timeit pixel_count_parallel(A) - pixel_count_parallel(B)
28.5 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 1

Related Questions