Reputation: 37
I'm having trouble writing my program in CUDA. The program I'm doing is an encryption which performs a multiplication of a matrix by a vector and gives me a result depending on my vector introduced. The problem is that I am taking time in both C++ and CUDA and gives me a better result in C++ than CUDA. What I did was to make a loop, because I require several keys for encryption, the code is as follows:
t1 = clock();
do {
HANDLE_ERROR ( cudaMemcpy(MAT_dev, MAT, nBytes, cudaMemcpyHostToDevice) );
HANDLE_ERROR ( cudaMemcpy(VEC_dev, VEC, nBytes, cudaMemcpyHostToDevice) );
mult<<< 1, b >>>(MAT_dev, VEC_dev, SOL_dev, b);
HANDLE_ERROR ( cudaMemcpy(SOL, SOL_dev, nBytes, cudaMemcpyDeviceToHost) );
for (i = 0; i < b; i++) {
cout << SOL[i] << " ";
}
cout << endl;
for (i = 0; i < b; i++) {
VEC[i] = SOL[i];
}
cont = cont + 1;
} while (cont < w);
t2 = clock();
My results :
The number of keys were 1,000,000. Matrix 7 x 7 and a Vector 7.
Do not know if it's ok or I'm missing something to make it faster.
Thanks for your help.
Upvotes: 1
Views: 154
Reputation: 9781
Possible problems of your code:
cudaMemcpy()
and cout<<
Possible solutions:
You may wish to read the CUDA C programming guide & C best practices guide before writing your own kernels
Upvotes: 3