Reputation: 529
I have the following MATLAB code :
[N, d] = size(X); % data size and dimensions
R = rand(d,dt); % Form a random matrix with elements in [0,1]
% Random projection
Y = X * R;
w=720; % hashing step
b = w * rand(dt,1);
% Compute the hash codes of the data
binId = floor( bsxfun(@plus, Y, b') / w);
and I tried to make it parallel using CUBLAS and a Kernel as follows :
__global__ void compute(const int N,const int dt,const int w,const float *old, int *newt){
int col = blockDim.y * blockIdx.y + threadIdx.y;
int row = blockDim.x * blockIdx.x + threadIdx.x;
int id = row+N*col;
if(row<N && col<dt){
newt[id]=(floor)(old[id]/w);
}
}
void gpu_blas_mmul(cublasHandle_t handle, const float *A, const float *B, float *C, const int m, const int k, const int n, const float bet) {
int lda=m,ldb=k,ldc=m;
const float alf = 1.0;
const float *alpha = &alf;
const float *beta = &bet;
// Do the actual multiplication and addition
cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, alpha, A, lda, B, ldb, beta, C, ldc);
}
float *d_R, *d_RX, *d_B_row;
int *d_H;
thrust::device_vector<float> d_X(h_X, h_X + N * d);
cudaMalloc(&d_R,d * dt * sizeof(float));
cudaMemcpy(d_R,h_R,d * dt * sizeof(float),cudaMemcpyHostToDevice);
cudaMalloc(&d_B_row,dt * sizeof(float));
cudaMemcpy(d_B_row,h_B_row,dt * sizeof(float),cudaMemcpyHostToDevice);
cudaMalloc(&d_RX,N * dt * sizeof(float));
cudaMalloc(&d_H,N * dt * sizeof(int));
//-------------------------CuBLAS-----------------------
cublasHandle_t handle;
cublasCreate(&handle);
thrust::device_vector<float> d_B_col(N, w);
gpu_blas_mmul(handle, thrust::raw_pointer_cast(&d_B_col[0]), d_B_row, d_RX, N, 1, dt,0.0);
gpu_blas_mmul(handle, thrust::raw_pointer_cast(&d_X[0]), d_R, d_RX, N, d, dt, 1.0);
cublasDestroy(handle);
//-----------------------Kernel----------------------------
dim3 blockSize(BLOCK_SIZE, BLOCK_SIZE,1);
int linGrid1 = (int)ceil(N/(float)BLOCK_SIZE);
int linGrid2 = (int)ceil(dt/(float)BLOCK_SIZE);
dim3 gridSize(linGrid1,linGrid2,1);
compute<<<gridSize, blockSize>>>(N, dt, w, d_RX, d_H);
In h_X, h_R and h_B_row I have saved (in column-major order) X, R and b produced by MATLAB. The dataset I am using is ANN_SIFT1M from http://corpus-texmex.irisa.fr/
For about 10000 values the results produced are exactly the same, but when I try with 50000 values for example there are some differences which become more and more as the number of values increases.
Any idea about what I am doing wrong?
Upvotes: 2
Views: 184
Reputation: 5697
Your MATLAB code uses double point precision so the result is more accurate. In contrast to that, CUDA kernel you provided uses single point precision, type float
, and therefore produces less accurate result. And as usually when facing single vs. double point precision issue, the problem only gets worse once you start increasing the size of your input data.
Solution would be to use type double
instead of float
.
Upvotes: 4