Tell nvcc to execute a loop's iterations in SIMD mode

Question

In OpenMP there is a way in which the programmer can hint to the compiler that the body of a loop can be vectorized. Is there something similar in CUDA C? Can we tell nvcc to use vector instructions when translating body of a loop? The code is supposed to be executed by thread processors which are SIMD, so that might be true.

Robert Crovella · Accepted Answer

Is there something similar in CUDA C? Can we tell nvcc to use vector instructions when translating body of a loop?

CUDA C is not a translation engine in the same way that OpenMP pragmas result in translation of code.

For the most part, CUDA GPUs have no vector instructions (excepting SIMD intrinsics and the corresponding PTX SIMD Video instructions). Typically, "vectorization" on a GPU is achieved via the SIMT mechanism.

A CUDA GPU thread processor is not SIMD. It is a single threaded single-data processor. SIMD/SIMT comes about by aggregation of adjacent threads into warps.

You may wish to review one of the CUDA whitepapers, such as the Fermi whitepaper e.g. page 7, which gives an overview of GPU thread execution.

Tell nvcc to execute a loop's iterations in SIMD mode

Answers (2)

Related Questions

Tell nvcc to execute a loop&#39;s iterations in SIMD mode

Answers (2)

Related Questions

Tell nvcc to execute a loop's iterations in SIMD mode