maxjvh
maxjvh

Reputation: 224

Metal GPU frame time behaves unintuitively

i've ran into an interesting performance issue with Metal in my own app which i've been able to reproduce by only making small adjustment to this example project. my view has a size of roughly 1600x900 and looks like this:

enter image description here

there are two draw calls per frame, one for the background and one for the line. the background is made up of 4 vertices and the line is around 2000 vertices. when the scene is drawn like above, Xcode's GPU frame capture tells me that the entire frame takes ~4 ms (!). some observations:

this doesn't make sense to me. why do the changes described above have such a drastic effect on the frame time? it's a 100x difference.

i'm running the code on a 2018 Mac mini (with Intel UHD Graphics 630 1536 MB), in case that is important.


here are the changes made to the demo project:

  1. create two MTLBuffers during intitialisation
AAPLVertex quadVertices[] = { ... 4 vertices omitted ... };
quadBuffer = [_device newBufferWithBytes:quadVertices length:4 * sizeof(AAPLVertex) MTLResourceStorageModeManaged];

AAPLVertex dataVertices[] = { ... ~2000 vertices omitted ... };
dataBuffer = [_device newBufferWithBytes:dataVertices length:2000 * sizeof(AAPLVertex) MTLResourceStorageModeManaged];
  1. draw both buffers in drawInMTKView:
[renderEncoder setVertexBuffer:quadBuffer offset:0 atIndex:AAPLVertexInputIndexVertices];
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:4];

[renderEncoder setVertexBuffer:dataBuffer offset:0 atIndex:AAPLVertexInputIndexVertices];
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip vertexStart:0 vertexCount:2000];
  1. turn on 8x MSAA: mtkView.sampleCount = 8; and pipelineStateDescriptor.sampleCount = 8;

  2. change the render pass's load action to MTLLoadActionLoad: renderPassDescriptor.colorAttachments[0].loadAction = MTLLoadActionLoad;

edit: the project is available on my Github.

edit 2: i ran the example project on a 2020 M1 Macbook and there i wasn't able to reproduce any of the bullet points. the total frame time was around 100 µs for the base-case. although, i had to use an MSAA factor of 4 since M1s apparently don't support 8.


to be transparent, i've also asked this question on the Apple Developer forums: https://developer.apple.com/forums/thread/695245 (i hope that's ok)

Upvotes: 5

Views: 553

Answers (2)

Hamid Yusifli
Hamid Yusifli

Reputation: 10137

Looks like it's a bug, I ran your sample projects on the following processors:
M1, M1 Pro, M1 Max, Radeon pro 5500M.

I wasn't able to reproduce your issue. I suggest you file bug report.

Upvotes: 1

rojun
rojun

Reputation: 26

I can verify, that without 8x MSAA it runs faster. However, 8x MSAA should not result in issues in such a simple project. Why would that be an option?

With an identical machine:

  • samplecount 1 => 300 us
  • samplecount 2 => 600 us
  • samplecount 4 => 1.9 ms
  • samplecount 8 => 3.7 ms

Upvotes: 0

Related Questions