Reputation:
I'm writing a program with CUDA and the problem is the following:
Two matrices A (n * 128) and B (m * 128)
I take the first row of A, and I compute the distance between that vector and all the rows of B, one by one.
I write the result of each distance on a row of a matrix C, so the element C(i,j) of C contains the distance between row i of A and row j of B.
-and I proceed with the next row of A.
I've implemented it this way: I've got a grid made by ( n * m ) blocks, and 128 threads per block. ( 1 * 128 ).
The program is compiling, but the problem is that it doesn't gives good distances. I can't figure out what wrong...
PS: I have CUDA 6.0 with a NVIDIA GTX 650 (copute capability 3.0)
__global__ void EuclidianDistances( float *A, float *B , float *C , int n , int m)
{
// SIZE is equal to 128
__shared__ float accumResult[SIZE];
__shared__ float sA[SIZE];
__shared__ float sB[SIZE];
// MAPPING
int bx = blockIdx.x; // n
int by = blockIdx.y; // m
int ty = threadIdx.y; // 128
int tx = threadIdx.x; // 1
sA[ty] = A [bx * SIZE + ty];
sB[ty] = B [by * SIZE + ty];
__syncthreads();
accumResult[ty] = (sA[ty] - sB[ty])*(sA[ty] - sB[ty]);
__syncthreads();
// Parallel tree-reduction
for (int stride = SIZE/2 ; stride < 0 ; stride >>= 1)
if (ty < stride)
{
accumResult[ty] += accumResult [stride + ty];
__syncthreads();
}
// Writing results to output matrix
if ((threadIdx.y == 0))
C [bx * m + by] = accumResult[ty];
__syncthreads();
}
Upvotes: 0
Views: 1635
Reputation: 43662
The condition looks wrong:
for (int stride = SIZE/2 ; stride < 0 ; stride >>= 1)
assuming SIZE is 128 as you said, this will not be executed. Also the __synchthread
inside the if statement might stall the entire thing
Edit: after reading OP's comments I realized this is a language problem.. here is a snippet:
#include <iostream>
using namespace std;
int main() {
int SIZE = 128;
for (int stride = SIZE/2 ; stride < 0 ; stride >>= 1)
cout << "Hello I'm running" << endl;
return 0;
}
The output is: nothing. Take a look at the for loop syntax in C++, the second part is the condition that should last for the entire duration of the loop. If you start with a false condition, your loop is never going to be executed.
Upvotes: 1