Measuring effective bandwidth on CUDA

Question

So I want to know how to calculate the total memory effective bandwidth for:

cublasSdot(handle, M, devPtrA, 1, devPtrB, 1, &curesult);

where that function belows to cublas_v2.h

That function runs in 0.46 ms, and the vectors are 10000 * sizeof(float)

Am I having ((10000 * 4) / 10^9 )/0.00046 = 0.086 GB/s?

I'm wondering about it because I don't know what is inside the cublasSdot function, and I don't know if it is necesary.

kangshiyin · Accepted Answer

In your case, the size of the input data is 10000 * 4 * 2 since you have 2 input vectors, and the size of the output data is 4. The effective bandwidth should be about 0.172 GB/s.

Basically cublasSdot() does nothing much more than computing. Profile result shows cublasSdot() invokes 2 kernels to compute the result. An extra 4-bytes device-to-host mem transfer is also invoked if the pointer mode is CUBLAS_POINTER_MODE_HOST, which is the default mode for cublas lib.

Measuring effective bandwidth on CUDA

Answers (2)

Related Questions