Reputation: 743
In OpenMP there is a way in which the programmer can hint to the compiler that the body of a loop can be vectorized. Is there something similar in CUDA C? Can we tell nvcc to use vector instructions when translating body of a loop? The code is supposed to be executed by thread processors which are SIMD, so that might be true.
Upvotes: 0
Views: 273
Reputation: 602
CUDA C is not a translation engine but OpenACC is, it's OpenMP for accelerators like graphic cards, this will really answer your question: https://developer.nvidia.com/openacc
Upvotes: 0
Reputation: 151809
Is there something similar in CUDA C? Can we tell nvcc to use vector instructions when translating body of a loop?
CUDA C is not a translation engine in the same way that OpenMP pragmas result in translation of code.
For the most part, CUDA GPUs have no vector instructions (excepting SIMD intrinsics and the corresponding PTX SIMD Video instructions). Typically, "vectorization" on a GPU is achieved via the SIMT mechanism.
A CUDA GPU thread processor is not SIMD. It is a single threaded single-data processor. SIMD/SIMT comes about by aggregation of adjacent threads into warps.
You may wish to review one of the CUDA whitepapers, such as the Fermi whitepaper e.g. page 7, which gives an overview of GPU thread execution.
Upvotes: 2