Reputation: 13
I'm working with OpenCL and I work with a matrix that I increase its values, and I need the application time to be as low as possible. What is the best way to improve performance with OpenCL? I've read something about data parallelism and task parallelism, but I do not know them very well.
I'm working with a 64x56 matrix. Using task parallelism I have create 64 kernels functions. One kernel for each column, but I think that I could do it much better.
Upvotes: 1
Views: 187
Reputation: 673
If you are executing the kernel on GPU, it might be better to make one thread handle one item. However, it depends on what exactly you are doing with the elements of the matrix, e.g. how many operations you perform on each of them. If you just increase the elements by some numbers, it might not be beneficial.
In general, there are 3 options:
Have you tried using just one kernel, that handles one element, and call clEnqueueNDRangeKernel for it with the global work size equal {64, 56}? How does it affect the execution time?
Upvotes: 1