Reputation: 1138
Apple has a useful tutorial called Displaying an AR Experience with Metal that shows you how to extract the Y and CbCr textures from an ARFrame
's capturedImage
property and convert them to RGB for rendering. However I've run into problems trying to take an RGBA texture and perform the inverse operation, i.e. converting back to the Y and CbCr textures.
I rewrote the fragment shader in the tutorial as a compute shader that writes to an rgba texture I created from a metal buffer:
// Same as capturedImageFragmentShader but it's a kernel function instead
kernel void yCbCrToRgbKernel(texture2d<float, access::sample> yTexture [[ texture(kTextureIndex_Y) ]],
texture2d<float, access::sample> cbCrTexture [[ texture(kTextureIndex_CbCr) ]],
texture2d<float, access::write> rgbaTexture [[ texture(kTextureIndex_RGBA) ]],
uint2 gid [[ thread_position_in_grid ]])
{
constexpr sampler colorSampler(mip_filter::linear, mag_filter::linear, min_filter::linear);
const float4x4 ycbcrToRGBTransform = float4x4(
float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)
);
float4 ycbcr = float4(yTexture.sample(colorSampler, float2(gid)).r, cbCrTexture.sample(colorSampler, float2(gid)).rg, 1.0);
float4 result = ycbcrToRGBTransform * ycbcr;
rgbaTexture.write(result, ushort2(gid));
}
I tried to write a second compute shader to perform the reverse operation, calculating the Y, Cb, and Cr values using the conversion formulae found on YCbCr's wikipedia page:
kernel void rgbaToYCbCrKernel(texture2d<float, access::write> yTexture [[ texture(kTextureIndex_Y) ]],
texture2d<float, access::write> cbCrTexture [[ texture(kTextureIndex_CbCr) ]],
texture2d<float, access::sample> rgbaTexture [[ texture(kTextureIndex_RGBA) ]],
uint2 gid [[ thread_position_in_grid ]])
{
constexpr sampler colorSampler(mip_filter::linear, mag_filter::linear, min_filter::linear);
float4 rgba = rgbaTexture.sample(colorSampler, float2(gid)).rgba;
// see https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion for conversion formulae
float Y = 16.0 + (65.481 * rgba.r + 128.553 * rgba.g + 24.966 * rgba.b);
float Cb = 128 + (-37.797 * rgba.r + 74.203 * rgba.g + 112.0 * rgba.b);
float Cr = 128 + (112.0 * rgba.r + 93.786 * rgba.g - 18.214 * rgba.b);
yTexture.write(Y, gid);
cbCrTexture.write(float4(Cb, Cr, 0, 0), gid); // this probably is not correct...
}
My problem is how to write data to these textures correctly. I know it is incorrect because the resulting display is a solid pink color. The expected result is obviously the original, unmodifie display.
The pixel formats for the Y, CbCr, and RGBA textures are .r8UNorm
, .rg8UNorm
, and rgba8UNorm
respectively.
Here is my swift code for setting up the textures and executing the shaders:
private func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> MTLTexture? {
guard CVMetalTextureCacheCreate(kCFAllocatorSystemDefault, nil, device, nil, &capturedImageTextureCache) == kCVReturnSuccess else { return nil }
var mtlTexture: MTLTexture? = nil
let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
var texture: CVMetalTexture? = nil
let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache!, pixelBuffer, nil, pixelFormat, width, height, planeIndex, &texture)
if status == kCVReturnSuccess {
mtlTexture = CVMetalTextureGetTexture(texture!)
}
return mtlTexture
}
func arFrameToRGB(frame: ARFrame) {
let frameBuffer = frame.capturedImage
CVPixelBufferLockBaseAddress(frameBuffer, CVPixelBufferLockFlags(rawValue: 0))
// Extract Y and CbCr textures
let capturedImageTextureY = createTexture(fromPixelBuffer: frameBuffer, pixelFormat: .r8Unorm, planeIndex: 0)!
let capturedImageTextureCbCr = createTexture(fromPixelBuffer: frameBuffer, pixelFormat: .rg8Unorm, planeIndex: 1)!
// create the RGBA texture
let rgbaBufferWidth = CVPixelBufferGetWidthOfPlane(frameBuffer, 0)
let rgbaBufferHeight = CVPixelBufferGetHeightOfPlane(frameBuffer, 0)
if rgbaBuffer == nil {
rgbaBuffer = device.makeBuffer(length: 4 * rgbaBufferWidth * rgbaBufferHeight, options: [])
}
let rgbaTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Unorm, width: rgbaBufferWidth, height: rgbaBufferHeight, mipmapped: false)
rgbaTextureDescriptor.usage = [.shaderWrite, .shaderRead]
let rgbaTexture = rgbaBuffer?.makeTexture(descriptor: rgbaTextureDescriptor, offset: 0, bytesPerRow: 4 * rgbaBufferWidth)
threadGroupSize = MTLSizeMake(4, 4, 1)
threadGroupCount = MTLSizeMake((rgbaTexture!.width + threadGroupSize!.width - 1) / threadGroupSize!.width, (rgbaTexture!.height + threadGroupSize!.height - 1) / threadGroupSize!.height, 1)
let yCbCrToRGBACommandBuffer = commandQueue.makeCommandBuffer()!
let yCbCrToRGBAComputeEncoder = yCbCrToRGBACommandBuffer.makeComputeCommandEncoder()!
yCbCrToRGBAComputeEncoder.setComputePipelineState(yCbCrToRgbPso)
yCbCrToRGBAComputeEncoder.setTexture(capturedImageTextureY, index: Int(kTextureIndex_Y.rawValue))
yCbCrToRGBAComputeEncoder.setTexture(capturedImageTextureCbCr, index: Int(kTextureIndex_CbCr.rawValue))
yCbCrToRGBAComputeEncoder.setTexture(rgbaTexture, index: Int(kTextureIndex_RGBA.rawValue))
yCbCrToRGBAComputeEncoder.dispatchThreadgroups(threadGroupCount!, threadsPerThreadgroup: threadGroupSize!)
yCbCrToRGBAComputeEncoder.endEncoding()
let rgbaToYCbCrCommandBuffer = commandQueue.makeCommandBuffer()!
let rgbaToYCbCrComputeEncoder = rgbaToYCbCrCommandBuffer.makeComputeCommandEncoder()!
rgbaToYCbCrComputeEncoder.setComputePipelineState(rgbaToYCbCrPso)
rgbaToYCbCrComputeEncoder.setTexture(capturedImageTextureY, index: Int(kTextureIndex_Y.rawValue))
rgbaToYCbCrComputeEncoder.setTexture(capturedImageTextureCbCr, index: Int(kTextureIndex_CbCr.rawValue))
rgbaToYCbCrComputeEncoder.setTexture(rgbaTexture, index: Int(kTextureIndex_RGBA.rawValue))
rgbaToYCbCrComputeEncoder.dispatchThreadgroups(threadGroupCount!, threadsPerThreadgroup: threadGroupSize!)
rgbaToYCbCrComputeEncoder.endEncoding()
yCbCrToRGBACommandBuffer.commit()
rgbaToYCbCrCommandBuffer.commit()
yCbCrToRGBACommandBuffer.waitUntilCompleted()
rgbaToYCbCrCommandBuffer.waitUntilCompleted()
CVPixelBufferUnlockBaseAddress(frameBuffer, CVPixelBufferLockFlags(rawValue: 0))
}
The end goal is to use metal shaders to do image processing on the rgba texture and eventually write back to the Y and CbCr textures for display on the screen.
Here are the parts I am unsure about
How do I write data in the correct format to these textures given that the type for the textures in the kernel function is texture2d<float, access::write>
but they have differing pixel formats?
Is my rewrite of capturedImageFragmentShader
in the Displaying an AR Experience with Metal as a compute shader as simple as I thought, or am I missing something there?
Upvotes: 3
Views: 2630
Reputation: 154
I just had to implement the same thing. Your first issue is a confusion between the values stored in the texture buffer and how these values are presented in the Metal kernel. As typical in GPU shaders, when integer values are accessed as float they get normalized to [0,1] when reading them, and scaled back to [0,MaxIntValue] on write. For Metal this conversion is documented Page 228 of https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf "7.7.1.1 Converting Normalized Integer Pixel Data Types to Floating-Point Values".
So for example if the texture format of the Y channel is .r8UNorm, the data is stored with 1 byte per pixel, with values from 0 to 255. But once accessed in the kernel via texture2d<float>
, the values will be in [0,1]. And when you are writing to such a texture, the values are automatically scaled back to [0,255]. So inside your kernel you should consider that you are dealing with values within [0,1] and not [0,255], and adjust your transforms accordingly.
The second issue is the RGBA to YCbCr transform itself. Assuming that the sample from Apple is correct, we can see that they follow the JPEG convention given at the end of the wikipedia page. The coefficients exactly match if you replace 128 by 128/255=0.5 and put that in a matrix form. The extra subtlety is that matrices are initialized in column-major mode in Metal code, so the corresponding math operation should read:
|+1. +0. +1.402 -0.701 | |Y |
|+1. -0.3441 -0.7141 +0.5291| |Cb|
RGBA = |+1. +1.772 +0. -0.886 | . |Cr|
|+0. +0. +0. +1. | |1 |
Next what you need is the inverse transform. You can find it in the same JPEG section of the wikipedia page (again replacing 128 by 0.5), or if you want to use the same matrix form you can simply compute the inverse of the 4x4 matrix and use that. This is what I did and I got that after putting it back to column-major:
const float4x4 rgbaToYcbcrTransform = float4x4(
float4(+0.2990, -0.1687, +0.5000, +0.0000),
float4(+0.5870, -0.3313, -0.4187, +0.0000),
float4(+0.1140, +0.5000, -0.0813, +0.0000),
float4(+0.0000, +0.5000, +0.5000, +1.0000)
);
Then adapting your kernel code something like this should work (I did not test that exact code, my texture layout is slightly different):
// Ignore alpha as we can't convert it, just set it to 1.
float3 rgb = rgbaTexture.sample(colorSampler, float2(gid)).rgb;
float4 ycbcr = rgbaToYcbcrTransform * float4(rgb, 1.0);
yTexture.write(ycbcr[0], gid);
cbCrTexture.write(float4(ycbcr[1], ycbcr[2], 0, 0), gid);
Upvotes: 2