VedhaR
VedhaR

Reputation: 507

OpenCL - clCreateBuffer size error. Possible work arounds?

After investigating the reason why my program was crashing, I found that I was hitting the maximum for a buffer size, which is 512Mb for me (CL_DEVICE_MAX_MEM_ALLOC_SIZE).

In my case, here are the parameters.

P = 146 (interpolation factor)
num_items = 918144 (number of samples)
sizeof(float) -> 4

So my clCreateBuffer looks something like this:

output = clCreateBuffer(
        context,
        CL_MEM_READ_ONLY,
        num_items * P * sizeof(float),
        NULL,
        &status);

When the above is multiplied together and divided by (1024x1024), you get around 511Mb which is under the threshold. Change any of the parameters to one higher now and it crashes because it will exceed that 512 value.

My questions is, how can I implement the code in a way where I can use block sizes to do my calculations instead of storing everything in memory and passing that massive chunk of data to the kernel? In reality, the number of samples I have could easily vary to over 5 million and I definitely will not have enough memory to store all those values.

I'm just not sure how to pass small sets of values into my kernel as I have three steps that the values go though before getting an output.

First is an interpolation kernel, then the values go to a lowpass filter kernel and then to a kernel that does decimation. After that the values are written to an output array. If further details of the program are needed for the sake of the problem I can add more.

UPDATE Not sure what the expected answer is here, if anyone has a reason I would love to hear it and potentially accept it as the valid answer. I don't work with OpenCL anymore so i don't have the setup to verify.

Upvotes: 3

Views: 2123

Answers (1)

Andreas
Andreas

Reputation: 5301

Looking at the OpenCL specification and clCreateBuffer I would say the solution here is allowing use of host memory by adding CL_MEM_USE_HOST_PTR to flags (or whatever suits your use case). Paragraphs from CL_MEM_USE_HOST_PTR:

This flag is valid only if host_ptr is not NULL. If specified, it indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr as the storage bits for the memory object.

The contents of the memory pointed to by host_ptr at the time of the clCreateBuffer call define the initial contents of the buffer object.

OpenCL implementations are allowed to cache the buffer contents pointed to by host_ptr in device memory. This cached copy can be used when kernels are executed on a device.

What this means is the driver will pass memory between host and device in the most efficient way it can. Basically what you propose yourself in comments, except it is already built into the driver, activated with a single flag, and probably more efficient than anything you can come up with.

Upvotes: 1

Related Questions