user3856196
user3856196

Reputation: 369

Reduction of odd number of elements CUDA

It seems that it possible to make reduction only for odd number of elements. For example, it needs to sum up numbers. When I have even number of elements, it will be like this:

1 2 3 4
1+2
3+3
6+4

But what to do when I have, for instance 1 2 3 4 5? The last iteration is the sum of three elements 6+4+5 or what? I saw the same question here, but couldn't find the answer.

Upvotes: 3

Views: 1551

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151879

A parallel reduction will add pairs of elements first:

1  1+3   4+6
2  2+4
3
4

Your example with an odd number of elements would typically be realized as:

1  1+4  5+3  8+7
2  2+5  7+0
3  3+0
4  0+0
5
0
0
0

That is to say, typically a parallel reduction will work with a power-of-2 set of threads, and at most one threadblock (the last one) will have less than a full complement of data to work with. The usual method to handle this is to zero-pad the data out to the threadblock size. If you study the cuda parallel reduction sample code, you'll find examples of this.

Upvotes: 4

Related Questions