Maxim Blumental
Maxim Blumental

Reputation: 743

Tell nvcc to execute a loop's iterations in SIMD mode

In OpenMP there is a way in which the programmer can hint to the compiler that the body of a loop can be vectorized. Is there something similar in CUDA C? Can we tell nvcc to use vector instructions when translating body of a loop? The code is supposed to be executed by thread processors which are SIMD, so that might be true.

Upvotes: 0

Views: 273

Answers (2)

Gabriel Garcia
Gabriel Garcia

Reputation: 602

CUDA C is not a translation engine but OpenACC is, it's OpenMP for accelerators like graphic cards, this will really answer your question: https://developer.nvidia.com/openacc

Upvotes: 0

Robert Crovella
Robert Crovella

Reputation: 151809

Is there something similar in CUDA C? Can we tell nvcc to use vector instructions when translating body of a loop?

CUDA C is not a translation engine in the same way that OpenMP pragmas result in translation of code.

For the most part, CUDA GPUs have no vector instructions (excepting SIMD intrinsics and the corresponding PTX SIMD Video instructions). Typically, "vectorization" on a GPU is achieved via the SIMT mechanism.

A CUDA GPU thread processor is not SIMD. It is a single threaded single-data processor. SIMD/SIMT comes about by aggregation of adjacent threads into warps.

You may wish to review one of the CUDA whitepapers, such as the Fermi whitepaper e.g. page 7, which gives an overview of GPU thread execution.

Upvotes: 2

Related Questions