Metal kernel writes random numbers into MTLBuffer if it is indexed with a variable - data hazard?

Question

I am trying to implement a high dynamic range algorithm on the GPU for which i need to calculate a histogram. The metal code looks like this so far:

kernel void
hist(texture2d_array inArray [[texture(0)]],
        device float *t [[buffer(0)]], // ignore this
        volatile device uint *histogram [[buffer(1)]],
        uint2 gid [[thread_position_in_grid]]){

int4 Y_ldr;
uint redChannel;

for(uint i = 0; i < inArray.get_array_size(); i++){
    Y_ldr = int4(inArray.read(gid, i, 0) * 255);
    redChannel = Y_ldr.r;
    histogram[redChannel]++;
}

}

The Kernel fills half of the histogram (256 entries) with huge numbers, the other half is null (initial values). Instead, when I write

histogram[0] = 1; // just a number
histogram[0] = redChannel; // OR this

I get the correct number at position 0 in both cases. Using atomic_uint doesn't help, so does the threadgroup barrier. Trying

  histogram[0]++;

reveals that metal does not handle data hazards automatically, but the numbers are small, i.e. like 12000. So, what is causing the trouble that

I get unreasonable numbers
exactly half of the array ist missed out?

If you need to know, how the pipeline state is set up, see here:

var threadGroupCount = MTLSizeMake(8, 8, 1)
var threadgroups = MTLSizeMake(pictures!.width/threadGroupCount.width, pictures!.height/threadGroupCount.height, 1)
computeCommandEncoder.setComputePipelineState(hist!)
computeCommandEncoder.setTexture(pictures, atIndex: 0)
computeCommandEncoder.setBuffer(exposure_times, offset: 0, atIndex: 0)
computeCommandEncoder.setBuffer(histogram, offset: 0, atIndex: 1) // <-- this is important!!!CommandEncoder.dispatchThreadgroups(threadgroups, threadsPerThreadgroup: threadGroupCount)

Metal kernel writes random numbers into MTLBuffer if it is indexed with a variable - data hazard?

Answers (1)

Related Questions