user007
user007

Reputation: 2172

Running MPSImageIntegral for UInt32_t

So, I have been trying to run some MPS Kernels. Based on my previous question here: MPSImageIntegral returning all zeros where I was trying to run MPSImageIntegral on float values. Now, I moved on to uint32_t values. But turns, out I always get an assert

/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-121.4.2/MPSImage/Filters/MPSIntegral.mm:196: failed assertion `Destination 0x600003b62760 texture format does not match source 0x600003b62680 texture format'

The assert is misleading, since my texture types are not a mismatch.

This is what I do to create my MTLTexture

+ (id<MTLTexture>) createTestTexture: (float)val metalDevice:(id<MTLDevice>)device textureWidth:(int)widthTex
{
    std::vector<uint32_t> testData;
    for(int i = 0; i < widthTex; i++)
        testData.push_back(i);

    MTLTextureDescriptor* desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatR32Uint 
                    width: widthTex height:1 mipmapped:NO];
    [desc setResourceOptions:MTLResourceStorageModeManaged];
    [desc setStorageMode:MTLStorageModeManaged];
    [desc setUsage:(MTLTextureUsageShaderRead | MTLTextureUsageShaderWrite)];

    id<MTLTexture> tex = [device newTextureWithDescriptor:desc];

    MTLRegion texDataRegion = MTLRegionMake2D(0, 0, widthTex, 1);
    [tex replaceRegion:texDataRegion mipmapLevel:0 withBytes:testData.data() bytesPerRow:1024];
    return tex;
}

This is the function that I use to create both my input and output texture. Then I go on to run my MPSImageIntegral like this:

id<MTLTexture> inTex = [ViewController createTestTexture:1.0f metalDevice:_device textureWidth:100];
id<MTLTexture> outTex = [ViewController createTestTexture:1.0f metalDevice:_device textureWidth:100];

id<MTLCommandQueue> _commandQueue = [_device newCommandQueue];
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];

// Create a MPS filter.
[integral encodeToCommandBuffer:commandBuffer sourceTexture:inTex destinationTexture:outTex];

Based, on the documentation here: https://developer.apple.com/documentation/metalperformanceshaders/image_filters?language=objc MPSImageIntegral supports MTLPixelFormatR32Uint, Is there something wrong that I'm doing here?

Upvotes: 0

Views: 128

Answers (1)

Ian Ollmann
Ian Ollmann

Reputation: 21

So, first a bit of caution here. Image integrals sound all very nice in the literature, but can fall over due to precision easily. For a single precision float, there are 24 bits of precision including the implicit 1 bit, 8 bits of exponent and a sign bit. So, if you for example add up more than about 65793 8-bit pixels the sum will no longer have enough precision to contain everything and start rounding. Typically image integrals are used to do things like get an area average by subtracting the integral at one point from the integral at another point. If the area is large enough that it rounds, then those subtractions will yield inexact results and for large enough areas you are likely to get garbage or at least noise in the output image. 2^16 pixels isnt' that much. It would be a 256x256 postage stamp, which isn't really very big on modern devices.

Doing similar things with uint32_ts will get you a bit more headroom before modulo overflow. You'll have 2^(32-image_bit_depth) pixels to work with instead of 2^(24-image_bit_depth) for float. However, it probably restricts you to integer pixel representations, because the texture unit will not automagically convert fp16 to uint32_t without a special kernel for it. So, if you are working with fp16, I wouldn't expect that to work unless Apple decided to write a separate kernel for that case. You have uint32_t samples here.

Some things that may lead to all 0s:

  1. On discrete devices, you need to synchronize the resources to read the data back out with a MTLBlitCommandEncoder. Though you have marked it as a managed resource, I don't see that code here, so my money is on this one. If you don't synchronize you get zeros no matter what happens. https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/ResourceOptions.html

  2. Though probably not in this case, it is possible you didn't copy the right data into the input texture up front. As Mattijs points out, rowBytes means things and you can't just throw numbers around. The value should typically be sizeof(pixel) * width, but might be larger if you have padding before the next scanline starts, or the image is drawn from a rectangular tile inside another image. If it comes from a CGImage, then the CGImage will be able to tell you what the rowBytes is for the image provider, vImage_Buffers have it in the struct itself, CVPixelBufferGetBytesPerRow(), etc. RowBytes is the distance in bytes from the start of the pixel at the origin to the start of the pixel on row 1. If images were 2d arrays, it would be (uintptr_t) &image[1][0] - (uintptr_t) &image[0][0]. Obviously this doesn't work if the image was allocated as an array of pointers to individually allocated 1D arrays. (Nobody does this, it would be dumb.) Since you have a height=1, that shouldn't matter in this case, but may as well throw a number in that MPS expects.

  3. Since you are using uint32_t for both input and output buffers, there could trivially be modulo overflow after the 1st pixel. I personally would look at using uint8_t input buffers to make this less imminent. Possibly the MPS function is looking at this and saying no way is this going to work! I'll just quietly fail. If it does, that is probably bug worthy on grounds of general unhelpfulness.

  4. Not everything goes MPS's way all the time. Sometimes compilations fail. Make sure your sandboxing / ASL spew squelching is not inhibiting MPS error spew on your DEBUG target and you are using appropriate MPS options to trigger extra debug spew, e.g. MPSKernelOptionsVerbose. If the kernel doesn't run, possibly because there is no kernel, then you'll see zeros. You could try initializing your output buffer to something other than zero to see if anything was written to it.

  5. You may wish to try an actual 2D image as a sanity check. This is obviously a two pass separable filter and the 1D,1-pass special case could be buggy.

Upvotes: 1

Related Questions