Reputation: 159
I want to implement a groupby reduction in OpenCL. For example, the input
a1 a2 a3 b1 b2 c3 c4
would produce
a6 b3 c7
The C pseudocode looks like this:
int data[n][2], result[n][2], result_count = -1,
added = 0, group = data[0][0];
for (int i = 0; i < n; i++) {
if (group == data[i][0]) {
added += data[i][1];
} else {
result[++result_count][0] = group;
result[result_count][1] = added;
group = data[i][0];
added = 0;
}
}
return result, result_count;
The only standard algorithm I know which goes in this direction is parallel reduction; however, it reduces to one number and not to a buffer of added values by group. I am not sure if parallel reduction could work with a dynamic result buffer (e.g. in local memory) and still be efficient in terms of performance.
Upvotes: 0
Views: 281
Reputation: 3012
A solution by Hashing
Phase 1) A hashing scheme can be used to hash the group value to a location, then an atomic add can sum the contents of the second value.
Phase 2) A prefix sum scan algorithm pass over the hash table to compact it.
Phase 3) Optionally sort the results
A solution by Sorting
Phase 1) Sort the data on the group value
Phase 2) Use a reduction to sum each group
Phase 3) A prefix sum scan to compact the sums
Upvotes: 1