Reputation: 404
My algorithm (parallel multi-frontal Gaussian elimination) needs to dynamically allocate memory (tree building) inside CUDA kernel. Does anyone know if gpuocelot supports such things?
According to this: stackoverflow-link and CUDA programming guide I can do such things. But with gpuocelot I get errors during runtime.
Errors:
malloc()
inside kernel I get this error:(2.000239) ExternalFunctionSet.cpp:371: Assertion message: LLVM required to call external host functions from PTX. solver: ocelot/ir/implementation/ExternalFunctionSet.cpp:371: void ir::ExternalFunctionSet::ExternalFunction::call(void*, const ir::PTXKernel::Prototype&): Assertion false' failed.
solver: ocelot/cuda/implementation/CudaRuntimeInterface.cpp:811: virtual cudaError_t cuda::CudaRuntimeInterface::cudaDeviceGetLimit(size_t*, cudaLimit): Assertion `0 && "unimplemented"' failed.
Maybe I have to point (somehow) to compiler that I want to use device malloc()
?
Any advice?
Upvotes: 0
Views: 528