Reputation: 175
If I have 2 work dimensions and set local_work_size
of clEnqueueNDRangeKernel to, say, {4, 4}
, would a single work-group consist of 4*4=16 local work-items or just 4 of them?
There is an image that describes 1 dimension case, in which each work-group contains all local work-items of the only dimension that is there, but I don't know how that expands to 2 dimension case, thus the question.
(source: fixstars.com)
Upvotes: 1
Views: 108
Reputation: 1091
You are correct with your assumption that a local work-size with {4, 4}
will yield 16 work-items per work-group. Here is an image showing this.
Additional info (in case you need it): The choice of dimension is strongly dependent on your actual problem, but also on memory access patterns and optimiziation potential. However most problems can be solved using 1-dimensional work-sizes (even when working with 2-dimensional data), especially if there is no involvement of neighboring values when processing elements.
Upvotes: 1