Reputation: 87
When trying the below code with int d[1];
, it works fine, but with int d[in_integer];
or int c[in_matrix[0]];
it results in nvcc compilation failed. May I see if anyone can suggest why? Is it possible to declare array in pycuda with size determined by function parameter value?
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import os
import numpy as np
_path = r"D:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\Hostx64\x64"
if os.system("cl.exe"):
os.environ['PATH'] += ';' + _path
if os.system("cl.exe"):
raise RuntimeError("cl.exe still not found, path probably incorrect")
a = np.asarray([1, 2])
a = a.astype(np.int32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
b = np.zeros(1).astype(np.int32)
b_gpu = cuda.mem_alloc(b.nbytes)
cuda.memcpy_htod(b_gpu, b)
mod = SourceModule("""
__global__ void array_declaration(int *in_matrix, int *out_matrix, int in_integer)
{
int d[1];
//int d[in_integer];
//int d[in_matrix[0]];
out_matrix[0] = in_matrix[0];
}
""")
func = mod.get_function("array_declaration")
func(a_gpu, b_gpu, np.int32(1), block=(1,1,1))
b_out = np.empty_like(b)
cuda.memcpy_dtoh(b_out, b_gpu)
print(b_out)
the error is as below
Traceback (most recent call last):
File "D:\PythonProjects\TradeAnalysis\Test\TestCUDAArrayDeclaration.py", line 37, in <module>
""")
File "D:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 291, in __init__
arch, code, cache_dir, include_dirs)
File "D:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "D:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 137, in compile_plain
stderr=stderr.decode("utf-8", "replace"))
CompileError: nvcc compilation of C:\Users\HENRYC~1\AppData\Local\Temp\tmpy0kcq_5i\kernel.cu failed
Upvotes: 0
Views: 162
Reputation: 152279
Is it possible to declare array in pycuda with size determined by function parameter value?
Instead of this:
int d[in_integer];
do this:
int *d = new int[in_integer];
or, equivalently:
int *d = (int *)malloc(in_integer*sizeof(d[0]));
Technically this is a pointer with an allocation, not an array, but it will function similarly for most use cases in C++ or CUDA C++.
This sort of in-kernel device memory allocation has a variety of caveats:
by default the total allocated space (the number of threads that currently have an allocation open times the allocation size) used by your kernel cannot exceed 8MB. This is adjustable.
you should typically have a corresponding C++ delete[]
operation after you are finished using the pointer (or free()
if you used malloc()
), in your kernel code (which also may help with the item above)
the CUDA device runtime indicates an allocation error here by returning a NULL pointer. When having trouble, its good practice to check for NULL in the kernel code, before using the pointer.
the device allocation step itself can noticeably impact kernel performance (execution duration). If your kernel is doing a lot of work/many other things, it may not be noticeable at all. For kernels doing little work, it may be very noticeable. Once the allocation is completed, using the allocated pointer shouldn't have any significant performance difference from using an array definition where the size is known as a compile-time constant.
Upvotes: 1