Effect of visual studio compiler settings on performance of CUDA kernels

Question

I get about 3-4x times difference in computation time of a same CUDA kernel compiled on two different machines. Both versions run on a same machine and GPU device. The direct conclusion explaining the difference is different compiler settings. Although there is no single perfect setting and the tuning should be customized depending on the kernel, I wonder if there is any clear guideline for helping to choose the right settings. I use Visual Studio 2010. Thank you.

Robert Crovella · Accepted Answer

Compile in release mode, not debug mode, if you want fastest performance. The -G switch passed to the nvcc compiler will usually have a negative effect on GPU code performance.
It's generally recommended to select the right architecture for the GPU you are compiling for. For example, if you have a cc 2.1 capability GPU, make sure that setting (sm_21, in GPU code settings) is being passed to the compiler. There are some counter examples to this (e.g. compiling for cc 2.0 seems to run faster, etc.) but as a general recommendation, it is best.
Use the latest version of CUDA (compiler). This is especially important when using GPU libraries (CUFFT, CUBLAS, etc.) (yes, this is not really a compiler setting)

Effect of visual studio compiler settings on performance of CUDA kernels

Answers (1)

Related Questions