Reputation: 3291
I have selected my context from multiple GPU Devices as such below:
type = platforms[0].getDevices(CL_DEVICE_TYPE_GPU, &devices);
if(type == CL_SUCCESS)
{
//Create context and access device names
cl::Context ctx_(devices);
context = ctx_;
gpuDevices = context.getInfo<CL_CONTEXT_DEVICES>();
for(i=0; i<gpuDevices.size(); i++) {
deviceName = gpuDevices[i].getInfo<CL_DEVICE_NAME>();
queues.emplace_back(cl::CommandQueue(context, gpuDevices[i], CL_QUEUE_PROFILING_ENABLE));
op::log("Adding " + deviceName + " to queue");
}
}
else if(type == CL_INVALID_DEVICE_TYPE || type == CL_DEVICE_NOT_FOUND)
{
throw std::runtime_error("Error: GPU Invalid Device or Device not found");
}
break;
However, when I create a cl::Buffer
, it only allows me to pass in one context. How does one select which GPU the memory gets created to.
The constructor of cl::Buffer is
Buffer(
const Context& context,
cl_mem_flags flags,
::size_t size,
void* host_ptr = NULL,
cl_int* err = NULL)
As you can see it only takes in 1 context, and I can't select my GPU
Upvotes: 4
Views: 1416
Reputation: 23438
Even in contexts with just one device, buffer objects may be resident on both the device and the host. If you fill the buffer using clEnqueueWriteBuffer
this will take place on a specific command queue, and thus be associated with a specific device. It would stand to reason that most implementations will in this case allocate memory on the device corresponding to the queue and use its DMA engine to fill the buffer.
You don't have any lower level control than this in OpenCL, however.
So if you keep using the same buffer on different queues with different devices, depending on the access mode and how the implementation is written, there might be multiple copies floating around, or the implementation keeps moving the memory. Profiling will tell you whether you're better off with separate contexts or a shared one.
Upvotes: 0
Reputation: 699
When you create a buffer for a context, which is shared between multiple devices, the buffer is "shared" between these devices, so you can execute commands on both of them using the same cl_mem
object. Whether the memory to hold this buffer is actually allocated on both devices is implementation defined. The OpenCL driver may defer an actual allocation until the buffer is needed by a command executing on a particular device, and usually it is smart enough to do this, but it really depends on the hardware and the implementation details.
Basically, you have 2 options:
cl::Context
for each device.Upvotes: 1