OpenCL host readable "thread local" memory

Question

I need (host readable) thread local memory for my OpenCL kernel. Let's look at an example:

//Assume #threads is known to be 8
double threadLocalScalar[8] = {}; //1 scalar per thread
for(some range in parallel)
    threadLocalScalar[getThreadId()] += 1;

This is a basic "thread local memory" solution. A vector of length #threads and the usage of function getThreadId().

Now, I need to do the same (or anything that works the same) in OpenCL. My research so far is to use the function get_group_id(0) to get the work group id (and maybe for simplicity use work group size = 1). This way I know what "thread" is executing and can modify the correct part of a global memory vector.

However, I don't know how many "threads" are going to be created. So I can't determine how much global memory I need for the threadLocalScalar vector. How can I know this? Or, do you know any better solution? Is my research even correct?

Note: The problem with using local memory is that I cannot read it from host. Otherwise I could easily use local memory and work_grop_size = 1 (only 1 "thread" per work_group meaning the local memory is "thread local".

Dithermaster · Accepted Answer

You only need shared local memory if adjacent work items need access to the same global memory locations; using shared local memory means you only have to read the global value once per work group instead of multiple times.

Your code example (threadLocalScalar[getThreadId()] += 1) doesn't do that, it just increments a memory location per work item (independent work), so you don't need shared local memory. Just increment your global memory based on the global_id(0).

OpenCL host readable "thread local" memory

Answers (1)

Related Questions

OpenCL host readable &quot;thread local&quot; memory

Answers (1)

Related Questions

OpenCL host readable "thread local" memory