zzyzy
zzyzy

Reputation: 983

Metal kernel shader not working

I am baffled as to why my kernel shader isn't working.

I have bona-fide raw RGBA32 pixel buffer (inBuffer), that I send to the kernel shader. I also have a receiving MTLTexture that I set the usage of to be MTLTextureUsageRenderTarget in its RGBA8Norm descriptor.

I then dispatch the encoding thusly...

id<MTLLibrary> library = [_device newDefaultLibrary];
id<MTLFunction> kernelFunction = [library newFunctionWithName:@"stripe_Kernel"];
id<MTLComputePipelineState> pipeline = [_device newComputePipelineStateWithFunction:kernelFunction error:&error];
id<MTLCommandQueue> commandQueue = [_device newCommandQueue];
MTLTextureDescriptor *textureDescription = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm
                                                                                              width:outputSize.width
                                                                                             height:outputSize.height
                                                                                          mipmapped:NO];
[textureDescription setUsage:MTLTextureUsageRenderTarget];
id<MTLTexture> metalTexture = [_device newTextureWithDescriptor:textureDescription];

MTLSize threadgroupCounts = MTLSizeMake(8, 8, 1);
MTLSize threadgroups = MTLSizeMake([metalTexture width] / threadgroupCounts.width,
                                   [metalTexture height] / threadgroupCounts.height, 1);

...

id<MTLBuffer> metalBuffer = [_device newBufferWithBytesNoCopy:inBuffer
                                                       length:inputByteCount
                                                       options:MTLResourceStorageModeShared
                                                      deallocator:nil];

    [commandEncoder setComputePipelineState:pipeline];
    [commandEncoder setTexture:metalTexture atIndex:0];
    [commandEncoder setBuffer:metalBuffer offset:0 atIndex:0];
    [commandEncoder setBytes:&imageW length:sizeof(ushort) atIndex:1];
    [commandEncoder setBytes:&imageH length:sizeof(ushort) atIndex:2];

    [commandEncoder dispatchThreadgroups:threadgroups threadsPerThreadgroup:threadgroupCounts];
    [commandEncoder endEncoding];

    [commandBuffer commit];
    [commandBuffer waitUntilCompleted];

The intent is to take a raw image that is mxn in size and pack it into a texture that is, say, 2048x896. Here's my kernel shader:

kernel void stripe_Kernel(texture2d<float, access::write> outTexture [[ texture(0) ]],
                      device const float *inBuffer [[ buffer(0) ]],
                      device const ushort * imageWidth [[ buffer(1) ]],
                      device const ushort * imageHeight [[ buffer(2) ]],
                      uint2 gid [[ thread_position_in_grid ]])
{
    const ushort imageW = *imageWidth;
    const ushort imageH = *imageHeight;

    const uint32_t textureW = outTexture.get_width();  // eg. 2048

    uint32_t posX = gid.x;  // eg. 0...2047
    uint32_t posY = gid.y;  // eg. 0...895

    uint32_t sourceX = ((int)(posY/imageH)*textureW + posX) % imageW;
    uint32_t sourceY = (int)(posY% imageH);

    const uint32_t ptr = (sourceX + sourceY* imageW);
    float pixel = inBuffer[ptr];

    outTexture.write(pixel, gid);
}

I later grab that texture buffer and convert it to a CVPixelBuffer:

MTLRegion region = MTLRegionMake2D(0, 0, (int)outputSize.width, (int)outputSize.height);
// lock buffers, copy texture over
CVPixelBufferLockBaseAddress(outBuffer, 0);
void *pixelData = CVPixelBufferGetBaseAddress(outBuffer);
[metalTexture getBytes:CVPixelBufferGetBaseAddress(outBuffer)
           bytesPerRow:CVPixelBufferGetBytesPerRow(outBuffer)
            fromRegion:region
           mipmapLevel:0];
CVPixelBufferUnlockBaseAddress(outBuffer, 0);

My problem is this, my CVPixelBuffer always comes up empty (allocated but is zero's). Running on an iMac 17,1 with Radeon M395 GPU.

I've even go so far as to ram opaque red pixels into the output texture in the kernel shader. Still, I don't even see red.

UPDATE: My solution to the issue was to abandon the use of MTLTextures altogether (I even attempted a texture synchronize with a MTLBlitCommandEncoder) -- no dice.

I ended up using MTLBuffers for both the input "texture" and the output "texture" instead and reworked the math in the kernel shader. My output buffer is now a pre-allocated, locked CVPixelBuffer which is what I eventually wanted anyways.

Upvotes: 0

Views: 1392

Answers (1)

alexst
alexst

Reputation: 631

First, with MTLTextureUsage.renderTarget I get the error "validateComputeFunctionArguments:825: failed assertion `Function writes texture (outTexture[0]) whose usage (0x04) doesn't specify MTLTextureUsageShaderWrite (0x02)'" so it should probably be MTLTextureUsage.shaderWrite.

For some reason if I force Intel GPU with gfxSwitch, the readback from texture returns correct data, but with Radeon it's always zero regardlessly of "textureDesc.resourceOptions = MTLResourceOptions.storageModeXXX" flags.

What has worked for me both with Intel and Radeon 460 was creating a MTLBuffer and using it instead of the texture. You would have to calculate the index, though. Should not be a big deal to switch to buffers if you're not using mip mapping or sampling with float indexes, right?.

let texBuffer = device?.makeBuffer(length:4 * width * height, options: MTLResourceOptions.storageModeShared)

var result = [Float](repeating:0, count: width * height * 4) let data = NSData(bytesNoCopy: texBuffer!.contents(), length: 4 * width * height, freeWhenDone: false) data.getBytes(&result, length: 4 * width * height)

I would assume creating a texture backed by MTLBuffer would work but the api is only in OSX 10.13.

EDIT: As pointed out by Ken Thomases, there is a similar discussion at Metal kernels not behaving properly on the new MacBook Pro (late 2016) GPUs

I have made a sample app using the approach and shader from the first post of this thread and the fix for the linked thread worked for me. Here is the link for the app code in case anyone wants a reproducible example. https://gist.github.com/astarasikov/9e4f58e540a6ff066806d37eb5b2af29

Upvotes: 1

Related Questions