Ren Geng
Ren Geng

Reputation: 13

CUDA sum across blocks

Hello I am new to cuda programming and I got a problem.

I have a variable, let's call foo stored in the shared memory of each block with different value from one block to another. And I want only one thread to sum all of them across blocks. I thought to send foo to global memory then compute the sum, but is there any function which can do this more quickly?

Thanks for your help.

Upvotes: 1

Views: 1148

Answers (1)

einpoklum
einpoklum

Reputation: 131544

It would be faster to have one thread in each block perform an atomicAdd() operation, adding the per-block-value to a single, grid-wide variable in global memory.

See the relevant section of the CUDA C Programming guide.

For a deeper exploration of optimizing reductions (= summation), albeit not necessarily the one you want to perform, have a look at Mark Harris' presentation: Optimizing Parallel Reduction in CUDA.

Upvotes: 2

Related Questions