Reputation: 495
I'm trying to write an OpenCL wrapper in C++. Yesterday I was working on my Windows 10 machine (NVIDIA GTX970 Ti, latest NVIDIA GeForce drivers I believe) and my code worked flawless.
Today, I'm trying it out on my laptop (Arch Linux, AMD Radeon R7 M265, Mesa 17.3.3) and I get a segfault when trying to create a command queue.
Here's the GDB backtrace:
#0 0x00007f361119db80 in ?? () from /usr/lib/libMesaOpenCL.so.1
#1 0x00007f36125dacb1 in clCreateCommandQueueWithProperties () from /usr/lib/libOpenCL.so.1
#2 0x0000557b2877dfec in OpenCL::createCommandQueue (ctx=..., dev=..., outOfOrderExec=false, profiling=false) at /home/***/OpenCL/Util.cpp:296
#3 0x0000557b2876f0cf in main (argc=1, argv=0x7ffd04fcdac8) at /home/***/main.cpp:27
#4 0x00007f361194cf4a in __libc_start_main () from /usr/lib/libc.so.6
#5 0x0000557b2876ecfa in _start ()
(I've censored part of the paths) Here's the code that's producing the error:
CommandQueue createCommandQueue(Context ctx, Device dev, bool outOfOrderExec, bool profiling) noexcept
{
cl_command_queue_properties props [3]= {CL_QUEUE_PROPERTIES, 0, 0};
if (outOfOrderExec)
{
props[1] |= CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE;
}
if (profiling)
{
props[1] |= CL_QUEUE_PROFILING_ENABLE;
}
int error = CL_SUCCESS;
cl_command_queue queue = clCreateCommandQueueWithProperties(ctx.get(), dev.get(), props, &error);
if (error != CL_SUCCESS)
{
std::cerr << "Error while creating command queue: " << OpenCL::getErrorString(error) << std::endl;
}
CommandQueue commQueue = CommandQueue(queue);
Session::get().registerQueue(commQueue);
return commQueue;
}
The line with clCreateCommandQueueWithProperties
is where the segfault happens.
Context
is a wrapper class for a cl_context
, Context::get()
returns the original cl_context:
class Context
{
private:
...
cl_context context;
public:
...
cl_context get() const noexcept;
...
};
Device
is a wrapper for cl_device
, Device::get()
also returns the cl_device:
class Device
{
private:
...
cl_device_type type;
cl_device_id id;
public:
...
cl_device_id get() const noexcept;
cl_device_type getType () const noexcept;
...
};
Here's the main function:
int main (int argc, char* argv [])
{
OpenCL::Session::get().init();
for (const std::string& deviceAddress : OpenCL::Session::get().getAddresses())
{
std::cout << "[" << deviceAddress << "]: " << OpenCL::Session::get().getDevice(deviceAddress);
}
OpenCL::Context ctx = OpenCL::getContext();
std::cout << "OpenCL version: " << ctx.getVersionString() << std::endl;
OpenCL::Kernel kernel = OpenCL::createKernel(OpenCL::createProgram("src/Kernels/Hello.cl", ctx), "SAXPY");
OpenCL::CommandQueue queue = OpenCL::createCommandQueue(ctx, OpenCL::Session::get().getDevice(ctx.getAssociatedDevices()[0]));
unsigned int testDataSize = 1 << 13;
std::vector <float> a = std::vector <float> (testDataSize);
std::vector <float> b = std::vector <float> (testDataSize);
for (int i = 0; i < testDataSize; i++)
{
a[i] = static_cast<float>(i);
b[i] = 0.0;
}
OpenCL::Buffer aBuffer = OpenCL::allocateBuffer(ctx, a.data(), sizeof(float), a.size());
OpenCL::Buffer bBuffer = OpenCL::allocateBuffer(ctx, b.data(), sizeof(float), b.size());
kernel.setArgument(0, aBuffer);
kernel.setArgument(1, bBuffer);
kernel.setArgument(2, 2.0f);
OpenCL::Event saxpy_event = queue.enqueue(kernel, {testDataSize});
OpenCL::Event read_event = queue.read(bBuffer, b.data(), bBuffer.size());
std::cout << "SAXPY kernel took " << saxpy_event.getRunTime() << "ns to complete." << std::endl;
std::cout << "Read took " << read_event.getRunTime() << "ns to complete." << std::endl;
OpenCL::Session::get().cleanup();
return 0;
}
(The profiling won't work because I've disabled it (thinking that was the cause of the problem), re-enabling profiling doesn't fix the issue however).
I'm using CLion, so here's a screenshot of my debugging window:
Finally here's the console output of the program:
/home/***/cmake-build-debug/Main
[gpu0:0]: AMD - AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1): 6 compute units @ 825MHz
OpenCL version: OpenCL 1.1 Mesa 17.3.3
Signal: SIGSEGV (Segmentation fault)
The context and device objects all seem to have been created without any issues so I really have no idea what's causing the segfault.
Is it possible that I've found a bug in the Mesa driver, or am I missing something obvious?
Edit: This person seems to have had a similar problem, unfortunately, his problem was just a C-style-forget-to-allocate-memory-problem.
2nd Edit: I may have found a possible cause of this problem, CMake is finding, using and linking against OpenCL 2.0 while my GPU only supports OpenCL 1.1. I'll look into this. I haven't found a way to roll back to OpenCL 1.1 on Arch Linux, but clinfo seems to be working fine and so is blender (which depends on OpenCL), so I don't think this is the problem.
Here's the output from clinfo:
Number of platforms 1
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 17.3.3
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Clover
Number of devices 1
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 17.3.3
Driver Version 17.3.3
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Max compute units 6
Max clock frequency 825MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Compiler Available Yes
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 2147483648 (2GiB)
Error Correction support No
Max memory allocation 1503238553 (1.4GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 32768 bits (4096 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max constant buffer size 1503238553 (1.4GiB)
Max number of constant args 16
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA]
clCreateContext(NULL, ...) [default] Success [MESA]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Clover
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Clover
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Clover
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2
3rd Edit: I've just run the code on my NVIDIA machine, works without issue, this is what the console shows:
[gpu0:0]: NVIDIA Corporation - GeForce GTX 970: 13 compute units @ 1253MHz
OpenCL version: OpenCL 1.2 CUDA 9.1.75
SAXPY kernel took 2368149686ns to complete.
Read took 2368158390ns to complete.
I've also fixed the 2 things Andreas mentioned
Upvotes: 3
Views: 2725
Reputation: 29
clCreateCommandQueue
is deprecated in OpenCL 1.2
It means you can use it both with or without properties.
Upvotes: 1
Reputation: 6343
clCreateCommandQueueWithProperties
got added in OpenCL 2.0. You should not use it with platforms and devices that are less than version 2.0 (such as 1.1 and 1.2 shown in your logs).
Upvotes: 4