Reputation: 365
In short: is that possible to determine if a block is the last (and if the first) on that particular SM?
Details: I have a problem, where each block make a quite complex calculation, which results in an array of about 2K elements, and i want so sum these elements. I have about 3K blocks. But if i atomic add at the end of each block to a global memory array, that could slow badly. So what i would like to do:
Is this possible? Or other solution?
Upvotes: 0
Views: 321
Reputation: 151944
It's not possible.
Shared memory is allocated per block. The lifetime of the shared memory begins when the block begins and ends when the block ends. Shared memory of other blocks on the SM will be separate, and it's not legal or valid to assume they would happen to be in the same place.
Each block should do it's own reduction, and write it's values to global memory. If you want to avoid the atomics, then have each block write it's own values to separate locations in shared memory, and have the last block in the grid perform the final calculations. This is possible following the method outlined in the threadfence reduction sample code
You could also have each block loop over multiple data sets. In that case, each block will be able to accumulate the results from several data sets into shared memory, before writing the intermediate results to global memory.
Upvotes: 2