Sachin
Sachin

Reputation: 3782

Parallelising running sum of a matrix in CUDA

I need to calculate the cumulative sum of a matrix which is that where value of each index (i,j) of the new cumulative sum matrix is sum of all the elements formed by the sub-matrix (0,0) to (i,j) of original one. Is there a way by which I can parallelise it using multiple threads of CUDA?

Upvotes: 0

Views: 2006

Answers (1)

Edric
Edric

Reputation: 25160

The cumulative sum is a scan, the CUDA SDK includes "scan" examples. Check the ScanLargeArray example. This is a very highly refined algorithm, and there's even a paper http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/scan/doc/scan.pdf describing all the steps taken to optimise the algorithm.

Upvotes: 2

Related Questions