devin287
devin287

Reputation: 31

How to normalize pixel values of an UIImage in Swift?

We are attempting to normalize an UIImage so that it can be passed correctly into a CoreML model.

The way we are retrieving the RGB values from each pixel is by first initializing a [CGFloat] array called rawData of values for each pixel such that there is a position for the colors Red, Green, Blue and the alpha value. In bitmapInfo, we get the raw pixel values from the original UIimage itself and conduct. This is used to fill the bitmapInfo paramter in context, a CGContext variable. We will later used the context variable to draw a CGImage that will later convert the normalized CGImage back into a UIImage.

Using a nested for-loop iterating through x and y coordinates, the minimum and maximum pixel color values among all colors (found through the CGFloat's raw data array) across all the pixels are found. A bound variable is set to terminate the for loop, otherwise, it will has out of range error.

range indicates the range of possible RGB values (ie. the difference between the maximum color value and the minimum).

Using the equation to normalize each pixel value:

A = Image
curPixel = current pixel (R,G, B or Alpha) 
NormalizedPixel = (curPixel-minPixel(A))/range

and a similar designed nested for loop from above to parse through the array of rawData and modify each pixel's colors according to this normalization.

Most of our codes are from:

  1. UIImage to UIColor array of pixel colors
  2. Change color of certain pixels in a UIImage
  3. https://gist.github.com/pimpapare/e8187d82a3976b851fc12fe4f8965789

We use CGFloat instead of UInt8 because the normalized pixel values should be real numbers that between 0 and 1, not either 0 or 1.

func normalize() -> UIImage?{

    let colorSpace = CGColorSpaceCreateDeviceRGB()

    guard let cgImage = cgImage else {
        return nil
    }

    let width = Int(size.width)
    let height = Int(size.height)

    var rawData = [CGFloat](repeating: 0, count: width * height * 4)
    let bytesPerPixel = 4
    let bytesPerRow = bytesPerPixel * width
    let bytesPerComponent = 8

    let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.byteOrder32Big.rawValue & CGBitmapInfo.alphaInfoMask.rawValue

    let context = CGContext(data: &rawData,
                            width: width,
                            height: height,
                            bitsPerComponent: bytesPerComponent,
                            bytesPerRow: bytesPerRow,
                            space: colorSpace,
                            bitmapInfo: bitmapInfo)

    let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height))
    context?.draw(cgImage, in: drawingRect)

    let bound = rawData.count

    //find minimum and maximum
    var minPixel: CGFloat = 1.0
    var maxPixel: CGFloat = 0.0

    for x in 0..<width {
        for y in 0..<height {

            let byteIndex = (bytesPerRow * x) + y * bytesPerPixel

            if(byteIndex > bound - 4){
                break
            }
            minPixel = min(CGFloat(rawData[byteIndex]), minPixel)
            minPixel = min(CGFloat(rawData[byteIndex + 1]), minPixel)
            minPixel = min(CGFloat(rawData[byteIndex + 2]), minPixel)

            minPixel = min(CGFloat(rawData[byteIndex + 3]), minPixel)


            maxPixel = max(CGFloat(rawData[byteIndex]), maxPixel)
            maxPixel = max(CGFloat(rawData[byteIndex + 1]), maxPixel)
            maxPixel = max(CGFloat(rawData[byteIndex + 2]), maxPixel)

            maxPixel = max(CGFloat(rawData[byteIndex + 3]), maxPixel)
        }
    }

    let range = maxPixel - minPixel
    print("minPixel: \(minPixel)")
    print("maxPixel : \(maxPixel)")
    print("range: \(range)")

    for x in 0..<width {
        for y in 0..<height {
            let byteIndex = (bytesPerRow * x) + y * bytesPerPixel

            if(byteIndex > bound - 4){
                break
            }
            rawData[byteIndex] = (CGFloat(rawData[byteIndex]) - minPixel) / range
            rawData[byteIndex+1] = (CGFloat(rawData[byteIndex+1]) - minPixel) / range
            rawData[byteIndex+2] = (CGFloat(rawData[byteIndex+2]) - minPixel) / range

            rawData[byteIndex+3] = (CGFloat(rawData[byteIndex+3]) - minPixel) / range

        }
    }

    let cgImage0 = context!.makeImage()
    return UIImage.init(cgImage: cgImage0!)
}

Before normalization, we expect the pixel values range is 0 - 255 and after normalization, the pixel values range is 0 - 1.

The normalization formula is able to normalize pixel values to values between 0 and 1. But when we try to print out (simply add print statements when we loop through pixel values) the pixel values before normalization to verify we get the raw pixel values correct, we found out that the range of those values are off. For example, a pixel value have value as 3.506e+305 (larger than 255.) We think we get the raw pixel value wrong at the beginning.

We are not familiar with image processing in Swift and we are not sure if the whole normalization process is right. any help would be appreciated!

Upvotes: 3

Views: 3033

Answers (2)

us_david
us_david

Reputation: 4917

There may be a better way to do the normalization which is through the coreml model itself when you convert a PyTorch or tensor flow model to coreml. It's done when using coreml tools for model conversion, and when the input type is specified, a scale (as well as a bias) factor can be specified to scale the input image:

import coremltools as ct
input_shape = (1, 3, 256, 256)
# Set the image scale and bias for input image preprocessing
scale = 1/(0.226*255.0)
bias = [- 0.485/(0.229) , - 0.456/(0.224), - 0.406/(0.225)]

image_input = ct.ImageType(name="input_1",
                           shape=nput_shape,
                           scale=scale, bias=bias,
                           color_layout=ct.colorlayout.RGB). 

There is more info at coreml tools site. If your model is from other means of non-coreml conversion. this method would not apply to you then. However, for most of the cases, we train the model in PyTorch or TF and run inference in iPhone, this would be the path that makes sense other than manipulating it in swift with CVPixelBuffer.

Upvotes: 0

Rob
Rob

Reputation: 437542

A couple of observations:

  1. Your rawData is floating point, CGFloat, array, but your context isn’t populating it with floating point data, but rather with UInt8 data. If you want a floating point buffer, build a floating point context with CGBitmapInfo.floatComponents and tweak the context parameters accordingly. E.g.:

    func normalize() -> UIImage? {
        let colorSpace = CGColorSpaceCreateDeviceRGB()
    
        guard let cgImage = cgImage else {
            return nil
        }
    
        let width = cgImage.width
        let height = cgImage.height
    
        var rawData = [Float](repeating: 0, count: width * height * 4)
        let bytesPerPixel = 16
        let bytesPerRow = bytesPerPixel * width
        let bitsPerComponent = 32
    
        let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue | CGBitmapInfo.floatComponents.rawValue | CGBitmapInfo.byteOrder32Little.rawValue
    
        guard let context = CGContext(data: &rawData,
                                      width: width,
                                      height: height,
                                      bitsPerComponent: bitsPerComponent,
                                      bytesPerRow: bytesPerRow,
                                      space: colorSpace,
                                      bitmapInfo: bitmapInfo) else { return nil }
    
        let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height))
        context.draw(cgImage, in: drawingRect)
    
        var maxValue: Float = 0
        var minValue: Float = 1
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                let value = rawData[offset]
                if value > maxValue { maxValue = value }
                if value < minValue { minValue = value }
            }
        }
        let range = maxValue - minValue
        guard range > 0 else { return nil }
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                rawData[offset] = (rawData[offset] - minValue) / range
            }
        }
    
        return context.makeImage().map { UIImage(cgImage: $0, scale: scale, orientation: imageOrientation) }
    }
    
  2. But this begs the question of why you’d bother with floating point data. If you were returning this floating point data back to your ML model, then I can imagine it might be useful, but you’re just creating a new image. Because of that, you also have to opportunity to just retrieve the UInt8 data, do the floating point math, and then update the UInt8 buffer, and create the image from that. Thus:

    func normalize() -> UIImage? {
        let colorSpace = CGColorSpaceCreateDeviceRGB()
    
        guard let cgImage = cgImage else {
            return nil
        }
    
        let width = cgImage.width
        let height = cgImage.height
    
        var rawData = [UInt8](repeating: 0, count: width * height * 4)
        let bytesPerPixel = 4
        let bytesPerRow = bytesPerPixel * width
        let bitsPerComponent = 8
    
        let bitmapInfo = CGImageAlphaInfo.premultipliedLast.rawValue
    
        guard let context = CGContext(data: &rawData,
                                      width: width,
                                      height: height,
                                      bitsPerComponent: bitsPerComponent,
                                      bytesPerRow: bytesPerRow,
                                      space: colorSpace,
                                      bitmapInfo: bitmapInfo) else { return nil }
    
        let drawingRect = CGRect(origin: .zero, size: CGSize(width: width, height: height))
        context.draw(cgImage, in: drawingRect)
    
        var maxValue: UInt8 = 0
        var minValue: UInt8 = 255
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                let value = rawData[offset]
                if value > maxValue { maxValue = value }
                if value < minValue { minValue = value }
            }
        }
        let range = Float(maxValue - minValue)
        guard range > 0 else { return nil }
    
        for pixel in 0 ..< width * height {
            let baseOffset = pixel * 4
            for offset in baseOffset ..< baseOffset + 3 {
                rawData[offset] = UInt8(Float(rawData[offset] - minValue) / range * 255)
            }
        }
    
        return context.makeImage().map { UIImage(cgImage: $0, scale: scale, orientation: imageOrientation) }
    }
    

    I just depends upon whether you really needed this floating point buffer for your ML model (in which case, you might return the array of floats in the first example, rather than creating a new image) or whether the goal was just to create the normalized UIImage.

    I benchmarked this, and it was a tad faster on iPhone XS Max than the floating point rendition, but takes a quarter of the the memory (e.g. a 2000×2000px image takes 16mb with UInt8, but 64mb with Float).

  3. Finally, I should mention that vImage has a highly optimized function, vImageContrastStretch_ARGB8888 that does something very similar to what we’ve done above. Just import Accelerate and then you can do something like:

    func normalize3() -> UIImage? {
        let colorSpace = CGColorSpaceCreateDeviceRGB()
    
        guard let cgImage = cgImage else { return nil }
    
        var format = vImage_CGImageFormat(bitsPerComponent: UInt32(cgImage.bitsPerComponent),
                                          bitsPerPixel: UInt32(cgImage.bitsPerPixel),
                                          colorSpace: Unmanaged.passRetained(colorSpace),
                                          bitmapInfo: cgImage.bitmapInfo,
                                          version: 0,
                                          decode: nil,
                                          renderingIntent: cgImage.renderingIntent)
    
        var source = vImage_Buffer()
        var result = vImageBuffer_InitWithCGImage(
            &source,
            &format,
            nil,
            cgImage,
            vImage_Flags(kvImageNoFlags))
    
        guard result == kvImageNoError else { return nil }
    
        defer { free(source.data) }
    
        var destination = vImage_Buffer()
        result = vImageBuffer_Init(
            &destination,
            vImagePixelCount(cgImage.height),
            vImagePixelCount(cgImage.width),
            32,
            vImage_Flags(kvImageNoFlags))
    
        guard result == kvImageNoError else { return nil }
    
        result = vImageContrastStretch_ARGB8888(&source, &destination, vImage_Flags(kvImageNoFlags))
        guard result == kvImageNoError else { return nil }
    
        defer { free(destination.data) }
    
        return vImageCreateCGImageFromBuffer(&destination, &format, nil, nil, vImage_Flags(kvImageNoFlags), nil).map {
            UIImage(cgImage: $0.takeRetainedValue(), scale: scale, orientation: imageOrientation)
        }
    }
    

    While this employs a slightly different algorithm, it’s worth considering, because in my benchmarking, on my iPhone XS Max it was over 5 times as fast as the floating point rendition.


A few unrelated observations:

  1. Your code snippet is normalizing the alpha channel, too. I’m not sure you’d want to do that. Usually colors and alpha channels are independent. Above I assume you really wanted to normalize just the color channels. If you want to normalize alpha channel, too, then you might have a separate min-max range of values for alpha channels and process that separately. But it doesn’t make much sense to normalize alpha channel with the same range of values as for the color channels (or vice versa).

  2. Rather than using the UIImage width and height, I’m using the values from the CGImage. This is important distinction in case your images might not have a scale of 1.

  3. You might want to consider early-exit if, for example, the range was already 0-255 (i.e. no normalization needed).

Upvotes: 3

Related Questions