Michelle
Michelle

Reputation: 611

float VS floatN

Is there any advantage when using floatN instead float in OpenCL?

for example

float3 position;

and

float posX, posY, posZ;

Thank you

Upvotes: 4

Views: 3219

Answers (3)

mfa
mfa

Reputation: 5087

In both Nvidia and AMD architectures, the memory is divided into banks of 128 bits. Often, reading a single float3 or float4 value is going to be faster for the memory controller than reading 3 separate floats.

When you read float values from consecutive memory addresses, you are relying heavily on the compiler to combine the reads for you. There is no guarantee that posX, posY, and posZ are in the same bank. Declaring it as float3 usually forces the locations of the component floats to fall within the same bank.

How the GPUs handle the vector computations varies between the vendors, but the memory accesses on both platforms will benefit from from the vectorization.

Upvotes: 2

prunge
prunge

Reputation: 23248

It depends on the hardware.

NVidia GPUs have a scalar architecture, so vectors provide little advantage on them over writing purely scalar code. Quoting the NVidia OpenCL best practices guide (PDF link):

The CUDA architecture is a scalar architecture. Therefore, there is no performance benefit from using vector types and instructions. These should only be used for convenience. It is also in general better to have more work-items than fewer using large vectors.

With CPUs and ATI GPUs, you will gain more benefits from using vectors as these architectures have vector instructions (though I've heard this might be different on the latest Radeons - wish I had a link to the article where I read this).

Quoting the ATI Stream OpenCL programming guide (PDF link), for CPUs:

The SIMD floating point resources in a CPU (SSE) require the use of vectorized types (float4) to enable packed SSE code generation and extract good performance from the SIMD hardware.

This article provides a performance comparison on ATI GPUs of a kernel written with vectors vs pure scalar types.

Upvotes: 8

Russell Zahniser
Russell Zahniser

Reputation: 16364

I'm not terribly familiar with OpenCL, but in GLSL doing math with vectors is more efficient because the GPU can apply the same operation to all N components concurrently. Also, in GLSL vectors also support operations like dot products as built-in language features.

Upvotes: 1

Related Questions