SaganRitual
SaganRitual

Reputation: 3203

Swift, macOS, mac with 2 GPUs, matrix operations work on one GPU, not on the other

I'm running macOS on my MacBook Pro (Retina, 15-inch, Mid 2015), which has two GPUs in it, according to "About this Mac" in the Apple menu. One GPU is an AMD Radeon R9 M370X 2 GB, the other is an Intel Iris Pro 1536 MB -- the standard chips, I guess? They're the chips that were in there when I bought it, nothing I added myself.

I'm using the Swift MPS library for matrix computations; it works great on the Intel GPU, but when I select the Radeon, I only ever get back zeros from every operation, with no error reported. I've looked around for documentation on it, but I can't find anything. The only clue I have so far is that the Radeon reports "not integrated" (or at least, I think it does, based on the sample code at Finding GPUs on macOS, which is about as useful as Apple's doc ever is, meaning not very). If I've read that page correctly, this is what my two GPUs are telling me.

Device Intel Iris Pro Graphics; caps: headful, not discrete, integrated, not external

Device AMD Radeon R9 M370X; caps: headful, discrete, not integrated, not external

I can't find any doc that would suggest what I'm doing wrong. I've been all over Apple's MPS documentation, to no avail. And as I say, the code works great on the Intel GPU, so I should think it would run on the Radeon too. I've run some downloadable diagnostic tools to check on the Radeon, but it doesn't show up in the menus of those tools. So I don't even know whether this is something I'm doing wrong in the code, or if the chip itself is broken.

Below is the code, which you can build as a console app by pasting into main.swift. Find the following line:

let device = MTLCopyAllDevices()[1]

I use [0] for the Intel, [1] for the Radeon, and you can see that the output is different, that is, all zeros for the Radeon. I suppose your mileage may vary depending on your machine. I welcome any input, cheers

import MetalPerformanceShaders

typealias MPSNumber = Float32

let MPSNumberSize = MemoryLayout<MPSNumber>.size
let MPSNumberTypeInGPU = MPSDataType.float32

class MPSNet {
    let commandBuffer: MTLCommandBuffer
    let commandQueue: MTLCommandQueue
    let device = MTLCopyAllDevices()[1]
    var neuronsInMatrix1: MPSMatrix?
    var neuronsInMatrix2: MPSMatrix?
    var neuronsOutMatrix: MPSMatrix?

    init() {
        guard let cq = device.makeCommandQueue() else { fatalError() }
        guard let cb = cq.makeCommandBuffer() else { fatalError() }

        commandQueue = cq
        commandBuffer = cb

        let cMatrices = 2
        let cRows = 1
        let cColumns = 3

        let sensoryInputs1: [MPSNumber] = [1, 2, 3]
        let sensoryInputs2: [MPSNumber] = [4, 5, 6]

        neuronsInMatrix1 = makeMatrix(device, sensoryInputs1)
        neuronsInMatrix2 = makeMatrix(device, sensoryInputs2)

        let rowStride = MPSMatrixDescriptor.rowBytes(fromColumns: cColumns, dataType: MPSNumberTypeInGPU)
        neuronsOutMatrix = makeMatrix(device, cRows, cColumnsOut: cColumns, rowStride: rowStride)

        let adder = MPSMatrixSum(
            device: device, count: cMatrices, rows: cRows, columns: cColumns, transpose: false
        )

        adder.encode(
            to: commandBuffer,
            sourceMatrices: [neuronsInMatrix1!, neuronsInMatrix2!],
            resultMatrix: neuronsOutMatrix!, scale: nil, offsetVector: nil,
            biasVector: nil, start: 0
        )

        commandBuffer.addCompletedHandler { _ in
            let motorOutputs = self.getComputeOutput(self.neuronsOutMatrix!)

            let discrete = !self.device.isLowPower && !self.device.isRemovable
            let caps = "\(self.device.isHeadless ? " headless" : " headful")" +
                       "\(discrete ? ", discrete" : ", not discrete")" +
                       "\(self.device.isLowPower ? ", integrated" : ", not integrated")" +
                       "\(self.device.isRemovable ? ", external" : ", not external")"

            print("Device \(self.device.name); caps:\(caps); motor outputs \(motorOutputs)")
        }
    }

    func compute() {
        commandBuffer.commit()
        commandBuffer.waitUntilCompleted()
    }
}

extension MPSNet {
    func getComputeOutput(_ matrix: MPSMatrix) -> [Double] {
        let rc = matrix.data.contents()
        return stride(from: 0, to: matrix.columns * MPSNumberSize, by: MPSNumberSize).map {
            offset in

            let rr = rc.load(fromByteOffset: offset, as: MPSNumber.self)

            return Double(rr)
        }
    }

    func loadMatrix(_ data: MTLBuffer, _ rawValues: [MPSNumber]) {
        let dContents = data.contents()

        zip(stride(from: 0, to: rawValues.count * MPSNumberSize, by: MPSNumberSize), rawValues).forEach { z in
            let (byteOffset, rawValue) = (z.0, MPSNumber(z.1))

            dContents.storeBytes(of: rawValue, toByteOffset: byteOffset, as: MPSNumber.self)
        }
    }

    func makeMatrix(_ device: MTLDevice, _ rawValues: [MPSNumber]) -> MPSMatrix {
        let rowStride = MPSMatrixDescriptor.rowBytes(
            fromColumns: rawValues.count, dataType: MPSNumberTypeInGPU
        )

        let descriptor = MPSMatrixDescriptor(
            dimensions: 1, columns: rawValues.count, rowBytes: rowStride,
            dataType: MPSNumberTypeInGPU
        )

        guard let inputBuffer = device.makeBuffer(
            length: descriptor.matrixBytes, options: MTLResourceOptions.storageModeManaged
        ) else { fatalError() }

        loadMatrix(inputBuffer, rawValues)

        return MPSMatrix(buffer: inputBuffer, descriptor: descriptor)
    }

    func makeMatrix(_ device: MTLDevice, _ cRowsOut: Int, cColumnsOut: Int, rowStride: Int) -> MPSMatrix {
        let matrixDescriptor = MPSMatrixDescriptor(
            dimensions: cRowsOut, columns: cColumnsOut,
            rowBytes: rowStride, dataType: MPSNumberTypeInGPU
        )

        return MPSMatrix(device: device, descriptor: matrixDescriptor)
    }
}

let net = MPSNet()
net.compute()

Upvotes: 2

Views: 366

Answers (2)

SaganRitual
SaganRitual

Reputation: 3203

The problem lies in the storage mode of your matrix buffers. You're using MTLResourceOptions.storageModeManaged, which tells Metal that you want to manage synchronization of the memory being shared between the CPU and GPU. As mentioned in another answer here, you must use MPSMatrix.synchronize(on: MTLCommandBuffer) after a GPU operation before you try to read the data with the CPU. But you must also synchronize in the other direction, ie, after CPU operations before you commit the command to the GPU, using MTLBuffer.didModifyRange(_: Range).

Alternatively, you can use the shared storage mode, MTLResourceOptions.storageModeShared, which takes care of the synchronization for you.

See Synchronizing a Managed Resource in the Apple doc for details.

Below is a working version of your example using the managed storage mode as you have. Notice the differences in function MPSNet.compute(). You can leave that stuff out and just change the storage mode when creating the MTLBuffers for your matrices if your app is ok to use the shared storage mode.

import MetalPerformanceShaders

typealias MPSNumber = Float32

let MPSNumberSize = MemoryLayout<MPSNumber>.size
let MPSNumberTypeInGPU = MPSDataType.float32

class MPSNet {
    let commandBuffer: MTLCommandBuffer
    let commandQueue: MTLCommandQueue
    let device = MTLCopyAllDevices()[1]
    var neuronsInMatrix1: MPSMatrix?
    var neuronsInMatrix2: MPSMatrix?
    var neuronsOutMatrix: MPSMatrix?

    init() {
        guard let cq = device.makeCommandQueue() else { fatalError() }
        guard let cb = cq.makeCommandBuffer() else { fatalError() }

        commandQueue = cq
        commandBuffer = cb

        let cMatrices = 2
        let cRows = 1
        let cColumns = 3

        let sensoryInputs1: [MPSNumber] = [1, 2, 3]
        let sensoryInputs2: [MPSNumber] = [4, 5, 6]

        neuronsInMatrix1 = makeMatrix(device, sensoryInputs1)
        neuronsInMatrix2 = makeMatrix(device, sensoryInputs2)

        let rowStride = MPSMatrixDescriptor.rowBytes(fromColumns: cColumns, dataType: MPSNumberTypeInGPU)
        neuronsOutMatrix = makeMatrix(device, cRows, cColumnsOut: cColumns, rowStride: rowStride)

        let adder = MPSMatrixSum(
            device: device, count: cMatrices, rows: cRows, columns: cColumns, transpose: false
        )

        adder.encode(
            to: commandBuffer,
            sourceMatrices: [neuronsInMatrix1!, neuronsInMatrix2!],
            resultMatrix: neuronsOutMatrix!, scale: nil, offsetVector: nil,
            biasVector: nil, start: 0
        )

        commandBuffer.addCompletedHandler { _ in
            let motorOutputs = self.getComputeOutput(self.neuronsOutMatrix!)

            let discrete = !self.device.isLowPower && !self.device.isRemovable
            let caps = "\(self.device.isHeadless ? " headless" : " headful")" +
                       "\(discrete ? ", discrete" : ", not discrete")" +
                       "\(self.device.isLowPower ? ", integrated" : ", not integrated")" +
                       "\(self.device.isRemovable ? ", external" : ", not external")"

            print("Device \(self.device.name); caps:\(caps); motor outputs \(motorOutputs)")
        }
    }

    func compute() {
        for matrix in [neuronsInMatrix1!, neuronsInMatrix2!, neuronsOutMatrix!] {
            let matrixData = matrix.data
            matrixData.didModifyRange(0..<matrixData.length)

            matrix.synchronize(on: commandBuffer)
        }

        commandBuffer.commit()
    }
}

extension MPSNet {
    func getComputeOutput(_ matrix: MPSMatrix) -> [Double] {
        let rc = matrix.data.contents()
        return stride(from: 0, to: matrix.columns * MPSNumberSize, by: MPSNumberSize).map {
            offset in

            let rr = rc.load(fromByteOffset: offset, as: MPSNumber.self)

            return Double(rr)
        }
    }

    func loadMatrix(_ data: MTLBuffer, _ rawValues: [MPSNumber]) {
        let dContents = data.contents()

        zip(stride(from: 0, to: rawValues.count * MPSNumberSize, by: MPSNumberSize), rawValues).forEach { z in
            let (byteOffset, rawValue) = (z.0, MPSNumber(z.1))

            dContents.storeBytes(of: rawValue, toByteOffset: byteOffset, as: MPSNumber.self)
        }
    }

    func makeMatrix(_ device: MTLDevice, _ rawValues: [MPSNumber]) -> MPSMatrix {
        let rowStride = MPSMatrixDescriptor.rowBytes(
            fromColumns: rawValues.count, dataType: MPSNumberTypeInGPU
        )

        let descriptor = MPSMatrixDescriptor(
            dimensions: 1, columns: rawValues.count, rowBytes: rowStride,
            dataType: MPSNumberTypeInGPU
        )

        guard let inputBuffer = device.makeBuffer(
            length: descriptor.matrixBytes, options: MTLResourceOptions.storageModeManaged
        ) else { fatalError() }

        loadMatrix(inputBuffer, rawValues)

        return MPSMatrix(buffer: inputBuffer, descriptor: descriptor)
    }

    func makeMatrix(_ device: MTLDevice, _ cRowsOut: Int, cColumnsOut: Int, rowStride: Int) -> MPSMatrix {
        let matrixDescriptor = MPSMatrixDescriptor(
            dimensions: cRowsOut, columns: cColumnsOut,
            rowBytes: rowStride, dataType: MPSNumberTypeInGPU
        )

        return MPSMatrix(device: device, descriptor: matrixDescriptor)
    }
}

let net = MPSNet()
net.compute()

Upvotes: 0

Ian Ollmann
Ian Ollmann

Reputation: 1592

It appears you have failed to use -[MPSMatrix synchronizeOnCommandBuffer:]. On discrete devices, some explicit synchronization is required before the data will come back from the GPU.

Upvotes: 1

Related Questions