Influence of work size dimension to performance OpenCL

Question

I have initially work units with the size of 11*11*6779. For the sake of simplicity I dont want to translate it into 1D global work size. When when i changed it into 21*21*6779 the performance is 5-6x slower than before. the code as far as i know has nothing to do with the number of threads being ran.

The amount of data transfered is only 4x bigger, which I dont think is a reason why the programm runs slower, because i tested the memory allocation process.

Note that my device has a max work items of 256*256*256, meaning I would be use half of all available work items, and this is not a dedicated device (also used for display..).

I wonder if setting the work item sizes into 21*21*6779 uses too many of my work items, or the dimensions are simply inconvenient for openCL to adjust ?

Influence of work size dimension to performance OpenCL

Answers (1)

Related Questions