horseshoe7
horseshoe7

Reputation: 2837

How to I write a CIFilter / CIKernel that returns a value / smaller image?

So I'm clear on how to write a CoreImage CIFilter that uses a CIColorKernel or CIBlendKernel. Essentially these methods are a 1px input transformed to 1px output.

This is not my question. Nor is it my question to use out-of-the-box filters.

Is it possible to either iterate over all pixels via a typical CIColorKernel function, while having access to a value(s) that I pass in as a pointer?

OR, is it possible to calculate an average over a ROI then return an image of width/height 1 ? i.e. the filter invokes the Metal function once?

Basically, I want to calculate the median color of any pixels that have a non-zero alpha channel. So is the term for this a "reduction filter" ?

I'm looking for the general form of such a Metal function, and what Kernel type that would be.

Is this even possible? I'd like to offload a lot of this computation onto the GPU, and the Apple documentation is a little light, beyond the superficial cases.

Upvotes: 0

Views: 259

Answers (1)

Kamil.S
Kamil.S

Reputation: 5543

Here's a naive average(not median!) implemention using Metal kernel. For performance consider how CIAreaAverage is done by Apple in the answer here.

I understand why you have considered 1x1 pixel output texture, but that's not necessarily something you want.

In the "hello shader" examples you are seeing an input texture which is processed resulting in output texture. A straightforward easy to understand example is calculating greyscale from RGB. Each RGB component of the pixel is used to calculate the greyscale pixel value. The gpu allows each pixel to be processed roughly at the same time. The important take away is also the output pixels are roughly written to all at the same time.

If you limit yourself to a 1x1 pixel output all your input pixels are competing and will nondeterministically overwrite their output.

What you are really after is an synchronized atomic way of writing to the output "pixel" (or rather something you will use to calculate the final pixel). In the code I provided this is achieved through device atomic_int metal type and the atomic_fetch_add_explicit (which is just fancy way of adding an integer). Obviously the less simultaneous access the better performance.

Notice each RGB channel gets a separate Int32 accumulator. This is intentional because we could easily overflow and device atomic_int of size Int32 is currently only Metal supported type with atomic math operations.

Aside the input texture we are introducing 4 buffers:
2 input ones:
ROI - 4 integer array representing your Region of Interest
Alpha threshold - the cutoff alpha below which you throw away the pixel.

2 output ones:
rgbSum - a 3 integer array, where indexes 0,1,2 will represent RGB channels sum respectively
count - processed pixel count so that we can calculate arithmetical average

Notice we are not going for "pure" median, because it would require sorting, and take away from the parallel processing advantage.

The idea is pretty simple - we add up total R,G,B channels value and calculate their average based on visited pixel count. You get the result in Swift completion handler when the GPU is done.

class MetalHelper {
    let metalDevice = MTLCreateSystemDefaultDevice()!
    var computePipelineState: MTLComputePipelineState?
    var textureCache: CVMetalTextureCache!
    lazy var commandQueue: MTLCommandQueue? = {
        return self.metalDevice.makeCommandQueue()
    }()
    init(shaderFunctionName: String) {
        let defaultLibrary = metalDevice.makeDefaultLibrary()!
        let kernelFunction = defaultLibrary.makeFunction(name: shaderFunctionName)
        do {
            computePipelineState = try metalDevice.makeComputePipelineState(function: kernelFunction!)
        } catch {
            print("Could not create pipeline state: \(error)")
        }
    }

    func process(pixelBuffer: CVPixelBuffer) {

        guard let commandQueue = device.makeCommandQueue() else { return }
        guard let commandBuffer = commandQueue.makeCommandBuffer() else { return }
        guard let commandEncoder = commandBuffer.makeComputeCommandEncoder() else { return }

        guard let inputTexture = makeTextureFromCVPixelBuffer(pixelBuffer: pixelBuffer, textureFormat: .bgra8Unorm) else { return nil }
        commandEncoder.setTexture(inputTexture, index: 0)

        let intputRoi: [Int32] = [Int32(0),  // origin X 
                                  Int32(0),  // origin Y
                                  Int32(100),// width
                                  Int32(100)]// height
        commandEncoder.setBuffer(metalDevice.makeBuffer(bytes: intputRoi, length: intputRoi.count * MemoryLayout<UInt32>.stride), offset: 0, index: 0)

        let inputThesholdAlpha: [Float] = [0.5]
        commandEncoder.setBuffer(device.makeBuffer(bytes: inputThesholdAlpha, length: inputThesholdAlpha.count *  MemoryLayout<Float>.stride), offset: 0, index: 1)

        let rgbSumOutput = metalDevice.makeBuffer(length: 3 * MemoryLayout<UInt32>.stride)
        commandEncoder.setBuffer(rgbSumOutput, offset: 0, index: 2)

        let count = metalDevice.makeBuffer(length: 1 * MemoryLayout<UInt32>.stride)
        commandEncoder.setBuffer(cout, offset: 0, index: 3)

        // Set up the thread groups.
        let width = computePipelineState!.threadExecutionWidth
        let height = computePipelineState!.maxTotalThreadsPerThreadgroup / width
        let threadsPerThreadgroup = MTLSizeMake(width, height, 1)
        let threadgroupsPerGrid = MTLSize(width: (inputTexture.width + width - 1) / width,
                                          height: (inputTexture.height + height - 1) / height,
                                          depth: 1)

        commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)

        commandEncoder.endEncoding()

        commandBuffer.addCompletedHandler { [weak self] (mtlCommandBuffer: MTLCommandBuffer) in
            let rgbSumOutputPointer = rgbSumOutput!.contents().bindMemory(to: Int32.self, capacity: houghSpaceCount)
            let countPointer = count!.contents().bindMemory(to: Int32.self, capacity: 180)

            let red = rgbSumOutputPointer[0]
            let blue = rgbSumOutputPointer[1]
            let green =  rgbSumOutputPointer[2]

            let finalPixel = (Float(rgbSumOutputPointer[0]) / Float(countPointer), Float(rgbSumOutputPointer[1]) / Float(countPointer), Float(rgbSumOutputPointer[2]) / Float(countPointer))
            // do something with final pixel
        }
    }

The actual kernel shader function

#include <metal_stdlib>
#include <metal_atomic>
using namespace metal;
    kernel void averageCompute(texture2d<half, access::read> inTexture [[ texture(0) ]],
                               device int *roi [[ buffer(0) ]],
                               device float *alphaThreshold [[ buffer(1) ]],
                               device atomic_int *rgbSum [[ buffer(2) ]],
                               device atomic_int *count [[ buffer(3) ]],
                               uint2 gid [[ thread_position_in_grid ]]) {
        // check if within RegionOfInterest
        if (gid.x < roi[0] || gid.y < roi[1] || gid.x > roi[2] || gid.y > roi[3]) {
            return;
        }
        half3 pixelAtCoordinates = inTexture.read(gid);
        // filter out pixels with too low alpha
        if (pixelAtCoordinates.a < alphaThreshold[0]) {
            return;
        }
        int red = (int)(pixelAtCoordinates.r * 255.0);
        atomic_fetch_add_explicit(&rgbSum[0],
                                  red,
                                  memory_order_relaxed);

        int blue = (int)(pixelAtCoordinates.b * 255.0);
        atomic_fetch_add_explicit(&rgbSum[1],
                                  blue,
                                  memory_order_relaxed);

        int green = (int)(pixelAtCoordinates.g * 255.0);
        atomic_fetch_add_explicit(&rgbSum[2],
                                  green,
                                  memory_order_relaxed);

        atomic_fetch_add_explicit(&count[0],
                                  1,
                                  memory_order_relaxed);
    }   

Usage: MetalHelper("averageCompute")

Upvotes: 0

Related Questions