Haidy
Haidy

Reputation: 1

Speed up Cuda program

Which part to change to speed up this code? And what exactly is code is doing?

__global_ void mat(Matrix a, Matrix b) 
{
   int[] tempData = new int[2];
   tempData[0] = threadIdx.x ;
   tempData[1] = blockIdx.x * blockDim;
   b.elements[tempData[1] + tempData[0]] = b.elements[tempData[1] + tempData[0]] * 5;
}

Upvotes: 0

Views: 110

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

If that is all the code in question, this is silliness:

int[] tempData = new int[2];
tempData[0] = threadIdx.x ;
tempData[1] = blockIdx.x * blockDim;

Just do this instead:

__global__ void mat(Matrix a, Matrix b) 
{

   int tempData_0 = threadIdx.x ;
   int tempData_1 = blockIdx.x * blockDim;
   b.elements[tempData_1 + tempData_0] = b.elements[tempData_1 + tempData_0] * 5;
}

The construct tempdata[0] + tempdata[1] is effectively creating the canonical CUDA 1D globally unique thread index:

int idx = threadIdx.x+blockDim.x*blockIdx.x;

With that constructed index, then your main code is:

b.elements[idx] = b.elements[idx] * 5;

Which is taking the elements of a vector (or a matrix, where the rows or columns are stored contiguously) and multiplying them by 5.

Your code could probably be simplified more to make it easier to read, following the outline I have give you using idx, but those changes won't make a significant performance difference. The compiler can figure out those kinds of transformations.

Upvotes: 1

Related Questions