Andy3000
Andy3000

Reputation: 21

High memory usage issue with metal using custom CIFilter

I’m working on a game in SceneKit, where I’ve made a custom CIKernel filter to create scaled passes of what’s on the screen in Metal (which is needed for many other reasons for this project).

It works well, and the CPU and FPS are low and steady. The issue I’m facing is that the memory usage is very high, about 1GB. Before using Metal, I used pure CiFilters for the same effect, and the memory was about 200mb, but CPU and FPS were terrible, so that is the reason I re-wrote the function in Metal.

The code below is what is used to make scaled passes out of a CIImage being sent.

Does anyone see an issue or something I’ve missed to fix the high memory usage? Maybe adding CVPixelBufferPool or changing the pixel formats? I’m clueless.

Thanks!


class ImageEffect: CIFilter {

    private let kernel: CIKernel

    var inputImage: CIImage?
    
    // Effect parameters
    var textureCache: CVMetalTextureCache?
    let device: MTLDevice?
    let ciContext: CIContext?
    let commandQueue: MTLCommandQueue?
    var commandBuffer: MTLCommandBuffer?
    var pixelBuffer: CVPixelBuffer?
    let attrs = [kCVPixelBufferPixelFormatTypeKey: Int(kCVPixelFormatType_32BGRA),
                 kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue,
         kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue,
                   kCVPixelBufferMetalCompatibilityKey: kCFBooleanTrue] as CFDictionary;
    let scales = Array<Float>([1.0, 1.0, 1.0])
    var passTex = [MTLTexture]();
    var passImg = [CIImage]();

    override init() {
        let url = Bundle.main.url(forResource: "default", withExtension: "metallib")!
        let data = try! Data(contentsOf: url)
        
        // Set up support objects
        device = MTLCreateSystemDefaultDevice();
        ciContext = CIContext();
        commandQueue = device?.makeCommandQueue();
        CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device!, nil, &textureCache);
        
        kernel = try! CIKernel(functionName: "imageeffect", fromMetalLibraryData: data) // (2)
        super.init()
    }
    
    required init?(coder: NSCoder) {
        fatalError("init(coder:) has not been implemented")
    }
    
    override var outputImage: CIImage? {
        guard let inputImage = self.inputImage else { return nil }
        let inputExtent = inputImage.extent
        pixelBuffer = nil;
        passTex = [];
        passImg = [];
        
        let w = inputExtent.width;
        let h = inputExtent.height;
     
        CVPixelBufferCreate(kCFAllocatorDefault, Int(w), Int(h), kCVPixelFormatType_32BGRA, attrs, &pixelBuffer);
        ciContext!.render(inputImage, to: pixelBuffer!);
  
        CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
        let inputTex = PixelBufferToMTLTexture(pixelBuffer: pixelBuffer!);
  
        // Add first pass
        passTex.append(inputTex)
        passImg.append(CIImage(mtlTexture: inputTex)!)
        commandBuffer = commandQueue?.makeCommandBuffer();

        // Effect textures settings
        let mpsScale = MPSImageLanczosScale(device: device!);
        let desc = MTLTextureDescriptor()
        desc.depth = 1
        desc.allowGPUOptimizedContents = true;
        desc.pixelFormat = MTLPixelFormat.rgba16Float;
        desc.usage = [.renderTarget, .shaderRead, .shaderWrite]
        desc.width = Int(w/2)
        desc.height = Int(h/2)
     
        // Queue up scale and blur kernels for the rest of the passes
        var k = 0
        for scale in scales{
            var tex = device!.makeTexture(descriptor: desc)
            let prevTex = passTex[k]
     
            mpsScale.encode(commandBuffer: commandBuffer!, sourceTexture: prevTex, destinationTexture: tex!);
            
            passTex.append(tex!)
            passImg.append(CIImage(mtlTexture: tex!)!)

            desc.width = desc.width / 2
            desc.height = desc.height / 2
            k = k + 1
        }
     
        // Execute scale kernels and wait for them to finish
        commandBuffer!.commit();
        commandBuffer!.waitUntilCompleted();
                 
        let roiCallback: CIKernelROICallback = { _, rect -> CGRect in  // (4)
            return rect
        }
        
        let outImg = self.kernel.apply(extent: inputExtent,
                                   roiCallback: roiCallback,
                                   arguments: [passImg[0], passImg[1], passImg[2], 0, 0])
        
        CVPixelBufferUnlockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
          
        return outImg?.oriented(.downMirrored);
    }
    

    func PixelBufferToMTLTexture(pixelBuffer:CVPixelBuffer) -> MTLTexture
    {
        var texture:MTLTexture!
        let width = CVPixelBufferGetWidth(pixelBuffer)
        let height = CVPixelBufferGetHeight(pixelBuffer)
        let format : MTLPixelFormat = .rgba8Unorm
        var textureRef : CVMetalTexture?;
        let status = CVMetalTextureCacheCreateTextureFromImage(nil, textureCache!, pixelBuffer, nil, format, width, height, 0, &textureRef)

        if(status == kCVReturnSuccess)
        {
            texture = CVMetalTextureGetTexture(textureRef!)
        }

        return texture;
    }

}


Upvotes: 1

Views: 575

Answers (1)

Frank Rupprecht
Frank Rupprecht

Reputation: 10408

It's really not advisable to do your own custom Metal processing inside of a CIFilters outputImage method.

Some background: A CIImage is basically just a receipt for creating an image, i.e., it stores all the instructions that should be performed during rendering. When you apply a CIFilter to a CIImage (i.e., setting it as inputImage of the filter and then getting the outputImage), you will get a new image that contains the instructions of the input image plus the instructions added by this filter. "Instruction" in most cases usually refers to applying a CIKernel with the image as one of the parameters.

The real processing happens later when you tell a CIContext to create an actual image from a CIImage. Core Image will then look at the instruction graph stored in the image, optimize it, allocate potentially needed intermediate resources, and queue the actual work on the GPU.

The last part is important here: Ideally you let Core Image do all the resource allocation because it knows best what to cache and what to (re-)use when. In your filter you are doing that manually and actually at the wrong time. You are creating an additional CIContext (which is not the one that will render the outputImage) and manually create buffers and Metal resources, thereby disrupting the natural Core Image "flow".

I would recommend the following:

  • You don't need to use MPSImageLanczosScale. There is the CILanczosScaleTransform that does exactly the same.
  • Don't employ your own Metal pipeline. Instead, apply other CIFilters before you pass their results to your kernel.
  • Don't create any buffers, contexts, or Metal objects on your filter. Let Core Image handle that for you.

Maybe like so:

override var outputImage: CIImage? {
    guard let inputImage = self.inputImage else { return nil }
    let inputExtent = inputImage.extent
    
    let scaleFilter = CIFilter.lanczosScaleTransform()
    scaleFilter.inputImage = inputImage
    scaleFilter.scale = 0.5

    var passImg: [CIImage]()
    for scale in scales {
        passImg.append(scaleFilter.outputImage!)
        scaleFilter.scale /= 2.0
    }

    let roiCallback: CIKernelROICallback = { _, rect -> CGRect in
        return rect
    }
    
    let outImg = self.kernel.apply(extent: inputExtent,
                               roiCallback: roiCallback,
                               arguments: [passImg[0], passImg[1], passImg[2], 0, 0])
      
    return outImg?.oriented(.downMirrored);
}

(You need to import CoreImage.CIFilterBuiltins to get CIFilter.lanczosScaleTransform().)

Upvotes: 2

Related Questions