Reputation: 19
I have written this code for vector addition using numba.SmartArrays. I am using this numba.SmartArrays for the first time. I am not sure how to use that. This code is not working and it is throwing errors.
import numpy as np
from numba import SmartArray,cuda, jit, uint32
li1=np.uint32([1,2,3,4])
li=np.uint32([1,2,3,4])
b=SmartArray(li,where="host",copy=True)
a=SmartArray(li1,where="host",copy=True)
c=np.uint32([1,1,1,1])
print type(li)
print type(a)
@cuda.jit('void(uint32[:],uint32[:],uint32[:])',type="gpu")
def additionG(c,a,b):
idx=cuda.threadIdx.x+cuda.blockDim.x*cuda.blockIdx.x
if idx< len(a):
a[idx]=c[idx]+b[idx]
dA=cuda.to_device(a)
dB=cuda.to_device(b)
dC=cuda.to_device(c)
additionG[1, 128](c,a,b)
print a.__array__()
Errors:
<type 'numpy.ndarray'>
<class 'numba.smartarray.SmartArray'>
Traceback (most recent call last):
File "C:\Users\hp-pc\My Documents\LiClipse Workspace\cuda\blowfishgpu_smart_arrays.py", line 20, in <module>
dA=cuda.to_device(a)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devices.py", line 257, in _require_cuda_context
return fn(*args, **kws)
File "C:\Anaconda\lib\site-packages\numba\cuda\api.py", line 55, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 403, in auto_device
devobj.copy_to_device(obj, stream=stream)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 148, in copy_to_device
sz = min(_driver.host_memory_size(ary), self.alloc_size)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1348, in host_memory_size
s, e = host_memory_extents(obj)
File "C:\Anaconda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1333, in host_memory_extents
return mviewbuf.memoryview_get_extents(obj)
TypeError: expected a readable buffer object
Upvotes: 0
Views: 755
Reputation: 19
Its been a while since I posted this question. Still posting the answer so that someone may find it helpful in future.
import numpy as np
from numba import SmartArray,cuda, jit, uint32,autojit
li1=np.uint32([6,7,8,9])
li=np.uint32([1,2,3,4])
a=SmartArray(li1,where='host',copy=True)
b=SmartArray(li,where="host",copy=True)
c=np.uint32([1,1,1,1])
def additionG(a,c):
idx=cuda.threadIdx.x+cuda.blockDim.x*cuda.blockIdx.x
if idx < len(c):
a[idx]=a[idx]+c[idx]
cuda.syncthreads()
bpg=1
tpb=128
dC=cuda.to_device(c)
cfunc = cuda.jit()(additionG)
cfunc[bpg, tpb](a,dC)
print a.__array__()
Upvotes: 2
Reputation: 72349
It looks to me like cuda.to_device
doesn't handle smart arrays, which would sort of make sense, because smart arrays are supposed to do away with explicit copy management.
If my reading of the documentation is correct (I have never tried SmartArray
before), you should just be able to change this
dA=cuda.to_device(a)
dB=cuda.to_device(b)
dC=cuda.to_device(c)
additionG[1, 128](c,a,b)
to just
dC=cuda.to_device(c)
additionG[1, 128](dC,a.gpu(),b.gpu())
The .gpu()
method should return a GPU resident object that the kernel can understand and access.
Upvotes: 1