Reputation: 611
Is there any advantage when using floatN instead float in OpenCL?
for example
float3 position;
and
float posX, posY, posZ;
Thank you
Upvotes: 4
Views: 3219
Reputation: 5087
In both Nvidia and AMD architectures, the memory is divided into banks of 128 bits. Often, reading a single float3 or float4 value is going to be faster for the memory controller than reading 3 separate floats.
When you read float values from consecutive memory addresses, you are relying heavily on the compiler to combine the reads for you. There is no guarantee that posX, posY, and posZ are in the same bank. Declaring it as float3 usually forces the locations of the component floats to fall within the same bank.
How the GPUs handle the vector computations varies between the vendors, but the memory accesses on both platforms will benefit from from the vectorization.
Upvotes: 2
Reputation: 23248
It depends on the hardware.
NVidia GPUs have a scalar architecture, so vectors provide little advantage on them over writing purely scalar code. Quoting the NVidia OpenCL best practices guide (PDF link):
The CUDA architecture is a scalar architecture. Therefore, there is no performance benefit from using vector types and instructions. These should only be used for convenience. It is also in general better to have more work-items than fewer using large vectors.
With CPUs and ATI GPUs, you will gain more benefits from using vectors as these architectures have vector instructions (though I've heard this might be different on the latest Radeons - wish I had a link to the article where I read this).
Quoting the ATI Stream OpenCL programming guide (PDF link), for CPUs:
The SIMD floating point resources in a CPU (SSE) require the use of vectorized types (float4) to enable packed SSE code generation and extract good performance from the SIMD hardware.
This article provides a performance comparison on ATI GPUs of a kernel written with vectors vs pure scalar types.
Upvotes: 8
Reputation: 16364
I'm not terribly familiar with OpenCL, but in GLSL doing math with vectors is more efficient because the GPU can apply the same operation to all N components concurrently. Also, in GLSL vectors also support operations like dot products as built-in language features.
Upvotes: 1