Philli
Philli

Reputation: 131

Metal kernel writes random numbers into MTLBuffer if it is indexed with a variable - data hazard?

I am trying to implement a high dynamic range algorithm on the GPU for which i need to calculate a histogram. The metal code looks like this so far:

kernel void
hist(texture2d_array<half, access::read> inArray [[texture(0)]],
        device float *t [[buffer(0)]], // ignore this
        volatile device uint *histogram [[buffer(1)]],
        uint2 gid [[thread_position_in_grid]]){

int4 Y_ldr;
uint redChannel;

for(uint i = 0; i < inArray.get_array_size(); i++){
    Y_ldr = int4(inArray.read(gid, i, 0) * 255);
    redChannel = Y_ldr.r;
    histogram[redChannel]++;
}

}

The Kernel fills half of the histogram (256 entries) with huge numbers, the other half is null (initial values). Instead, when I write

histogram[0] = 1; // just a number
histogram[0] = redChannel; // OR this

I get the correct number at position 0 in both cases. Using atomic_uint doesn't help, so does the threadgroup barrier. Trying

  histogram[0]++;

reveals that metal does not handle data hazards automatically, but the numbers are small, i.e. like 12000. So, what is causing the trouble that

  1. I get unreasonable numbers
  2. exactly half of the array ist missed out?

If you need to know, how the pipeline state is set up, see here:

var threadGroupCount = MTLSizeMake(8, 8, 1)
var threadgroups = MTLSizeMake(pictures!.width/threadGroupCount.width, pictures!.height/threadGroupCount.height, 1)
computeCommandEncoder.setComputePipelineState(hist!)
computeCommandEncoder.setTexture(pictures, atIndex: 0)
computeCommandEncoder.setBuffer(exposure_times, offset: 0, atIndex: 0)
computeCommandEncoder.setBuffer(histogram, offset: 0, atIndex: 1) // <-- this is important!!!CommandEncoder.dispatchThreadgroups(threadgroups, threadsPerThreadgroup: threadGroupCount)

Upvotes: 3

Views: 820

Answers (1)

Philli
Philli

Reputation: 131

jesus -.- The iPad CPU interprets UInt as a 64bit number (8 bytes). To the GPU, Int implicates 32bit (4 bytes). When the pointer is set at position [1], the CPU interprets this position a the upper 4 bytes of the array entry [0]. That causes the insanely huge numbers. To solve this issue, I have to set up the histogram to be [UInt32] on the CPU side.

Upvotes: 3

Related Questions