DeviceToDevice bandwidth formula in CUDA sample code bandwitdhTest

Question

The formula in the sample code provided with the SDK is the following (for DtoD transfer):

bandwidthInMBs = 2.0f * ((float)(1<<10) * memSize * (float)MEMCOPY_ITERATIONS) / (elapsedTimeInMs * (float)(1 << 20));

The 2.0f multiplier in the beginning does not exist for the DtoH and HtoD cases. Why? Is this because for the DtoD case, two copying operations are performed, so twice the memSize is actually transferred?

Also, how accurate is this formula on a physically unified system such as the Jetson TK1? Is the 2.0f multiplier necessary?

For example, on the Jetson TK1 I'm getting the following numbers:

DtoH = 6.1 GB/s

HtoD = 6.1 GB/s

DtoD = 12.2 GB/s (just because of the multiplier!)

DeviceToDevice bandwidth formula in CUDA sample code bandwitdhTest

Answers (1)

Related Questions