Reputation: 374
I would like to know what techniques I could apply to add some dimensions of an array and save to a new Vet lower as in the following example:
A -> [1,2], [3,4], [5,6]
B -> [3], [7], [11]
figure: http://snag.gy/83Qwl.jpg
Upvotes: 0
Views: 184
Reputation: 151889
If you want to write your own CUDA kernel, take a look at the Vector add sample. Instead of passing 2 input vectors to the kernel, you would pass just A
and provide a loop to sum over the "rows" of A
:
__global__ void mykernel(int *A, int *B, int rows, int cols){
int idx=threadIdx.x+blockDim.x*blockIdx.x;
if (idx < rows) {
int sum = 0;
for (int i=0; i< cols; i++)
sum += A[(idx*cols)+i];
B[idx] = sum;
}
}
This won't be terribly efficient, but you can improve the efficiency if you can store your A
array in column major order:
A -> [1,3,5], [2,4,6]
then a modification to the above kernel becomes pretty efficient:
__global__ void mykernel(int *A, int *B, int rows, int cols){
int idx=threadIdx.x+blockDim.x*blockIdx.x;
if (idx < rows) {
int sum = 0;
for (int i=0; i< cols; i++)
sum += A[(i*cols)+idx];
B[idx] = sum;
}
}
If you're looking for efficiency but can't reorganize your data, then a segmented parallel reduction will be fastest. You can try creating something based on the cuda sample codes but I would suggest using thrust, specifically reduce_by_key
You would leave your A
array as is and use it as the "values":
A -> [1,2], [3,4], [5,6]
And you would create a "key" array which corresponds to the rows of your A array:
K -> [0,0], [1,1], [2,2]
Upvotes: 1