Reputation: 183
For an array X in the Global memory, I need to write two values in every Kernel execution.
X[p]=mul1+mul2;
X[p+a]=mul1-mul2;
Here 'a' can range from 0 to very high values. I observed that these two writes slow down my kernel to a great extent.
Upvotes: 0
Views: 114
Reputation: 489
Assuming p
is linearly dependent from your thread ID, you are doing things the right way. You could try to pass X+a
as a second argument to your kernel to do Y[p]=mul1-mul2;
instead of X[p+a]=mul1-mul2;
but I doubt it will be really faster.
Concerning your second question, if you are thinking of having two kernels, one performing the addition, the other the substraction and launch them concurrently, you cannot be sure they will be run side-by-side in parallel. Once again I doubt it will be faster in the end.
Upvotes: 0