Better timing in C++ in than Cuda

Question

I'm having trouble writing my program in CUDA. The program I'm doing is an encryption which performs a multiplication of a matrix by a vector and gives me a result depending on my vector introduced. The problem is that I am taking time in both C++ and CUDA and gives me a better result in C++ than CUDA. What I did was to make a loop, because I require several keys for encryption, the code is as follows:

t1 = clock();
do {

    HANDLE_ERROR ( cudaMemcpy(MAT_dev, MAT, nBytes, cudaMemcpyHostToDevice) );
    HANDLE_ERROR ( cudaMemcpy(VEC_dev, VEC, nBytes, cudaMemcpyHostToDevice) );

    mult<<< 1, b >>>(MAT_dev, VEC_dev, SOL_dev, b);

    HANDLE_ERROR ( cudaMemcpy(SOL, SOL_dev, nBytes, cudaMemcpyDeviceToHost) );

    for (i = 0; i < b; i++) {
        cout << SOL[i] << " ";
    }
    cout << endl;

    for (i = 0; i < b; i++) {
        VEC[i] = SOL[i];
    }

    cont = cont + 1;

} while (cont < w);
t2 = clock();

My results :

C++ : 11.474 minutes

CUDA : 40.464 minutes

The number of keys were 1,000,000. Matrix 7 x 7 and a Vector 7.

Do not know if it's ok or I'm missing something to make it faster.

Thanks for your help.

Better timing in C++ in than Cuda

C++ : 11.474 minutes

CUDA : 40.464 minutes

Answers (1)

Related Questions