Reputation: 849
I want to know how good my CUDA kernels are in terms of memory bandwidth utilisation. I run them on a Tesla K40c with ECC on. Is the result given by the bandwidthTest
utility a good approximation to the attainable peak? Else, how would one go about writing a similar test to find the peak bandwidth?
I mean device memory bandwidth.
Upvotes: 1
Views: 408
Reputation: 4097
The source code for bandwidth test is included with the CUDA SDK so you can review it directly. The bandwidthTest example performs a test of the transfer time between the device and the host, the host and the device, and the device and the device (transferring memory on the card).
This is a real execution of a memory transfer but it takes advantage of several things:
Doing real work with a kernel while performing memory transfers will likely result in a reduction of performance. However, you can reference the bandwidth test code and use it as a guide for improving your transfers. Consider pinned memory, asynchronous transfers, or the newer shared memory methods that do not require explicit transfer of data. Also keep in mind that bandwidthTest is only counting bulk transfers around memory and is not really taking a measure of things like shared memory.
The final performance will depend greatly on the kernel and the count and size of the memory transfers you are performing.
Upvotes: 1