sarasvati
sarasvati

Reputation: 792

iOS Metal: The fastest way to read read-only data?

Situation: In a Metal kernel function, every thread in a threadgroup reads the exact same value at a time. The kernel pseudocode:

kernel void foo(device   int2*   ranges,  
                constant float3& readonlyBuffer,  
                device   float*  results,  
                uint lno [[ threadgroup_position_in_grid ]])  
{  
  float acc = 0.0;  

  for(int i=ranges[lno].x; i<ranges[lno].y; i++) {  
    // each thread in threadgroup processes the same value from the buffer  
    acc += process( readonlyBuffer[i] );  
  }  

  results[...] = acc;  
} 

The problem: in the pursuit of optimizing buffer reads, I changed readonlyBuffer's address space qualifier from device to constant. This had zero impact on the kernel performance although the Apple documentation says something different:

The constant address space is optimized for multiple instances executing a graphics or kernel function accessing the same location in the buffer.

Questions:

Upvotes: 2

Views: 529

Answers (1)

Taylor
Taylor

Reputation: 6410

In your example code, indexing into readonlyBuffer would generate a compiler error.

Assuming readonlyBuffer is declared as a pointer, then the compiler doesn't statically know the size, and can't move the data to the constant memory space.

If readonlyBuffer is small (you only have 4KB of constant memory to work with), put it into a struct as follows:

struct ReadonlyBuffer {
    float3 values[MAX_BUFFER_SIZE];
};

Then do:

kernel void foo(device   int2*   ranges,  
                constant ReadonlyBuffer& readonlyBuffer,  
                device   float*  results,  
                uint lno [[ threadgroup_position_in_grid ]]) 

Finally, run a GPU trace ("Capture GPU Frame") and ensure you don't get the following error:

The Compiler was not able to Preload your Buffer. Kernel Function, Buffer Index: 1.

For more info about Buffer Preloading, see: https://developer.apple.com/videos/play/wwdc2016/606/?time=408

Upvotes: 3

Related Questions