sevensevens
sevensevens

Reputation: 1750

Metal rendering really slow - how to speed it up

I have a working metal application that is extremely slow, and needs to run faster. I believe the problem is I am creating too many MTLCommandBuffer objects.

The reason I am creating so many MTLCommandBuffer objects is I need to send different uniform values to the pixel shader. I've pasted a snippit of code to illustrate the problem below.

  for (int obj_i = 0 ; obj_i < n ; ++obj_i)
  {
     // I create one render command buffer per object I draw so I can use  different uniforms
     id <MTLCommandBuffer> mtlCommandBuffer = [metal_info.g_commandQueue commandBuffer];
     id <MTLRenderCommandEncoder> renderCommand = [mtlCommandBuffer renderCommandEncoderWithDescriptor:<#(MTLRenderPassDescriptor *)#>]

     // glossing over details, but this call has per object specific data
     memcpy([global_uniform_buffer contents], per_object_data, sizeof(per_data_object));

     [renderCommand setVertexBuffer:object_vertices  offset:0 atIndex:0];
     // I am reusing a single buffer for all shader calls
     // this is killing performance
     [renderCommand setVertexBuffer:global_uniform_buffer offset:0 atIndex:1];

     [renderCommand drawIndexedPrimitives:MTLPrimitiveTypeTriangle
                               indexCount:per_object_index_count
                               indexType:MTLIndexTypeUInt32
                             indexBuffer:indicies
                       indexBufferOffset:0];
     [renderCommand endEncoding];
     [mtlCommandBuffer presentDrawable:frameDrawable];
     [mtlCommandBuffer commit];
}  

The above code draw as expected, but is EXTREMELY slow. I'm guessing because there is a better way to force pixel shader evaluation than creating a MTLCommandBuffer per object.

I've consider simple allocating a buffer much larger than is needed for a single shader pass and simply using offset to queue up several calls in one render command encoder then execute them. This method seems pretty unorthodox, and I want to make sure I'm solving the issue of needed to send custom data per object in a Metal friendly way.

What is the fastest way to render using multiple passes of the same pixel/vertex shader with per call custom uniform data?

Upvotes: 3

Views: 2317

Answers (1)

Muzza
Muzza

Reputation: 1256

Don't reuse the same uniform buffer for every object. Doing that destroys all parallelism between the CPU and GPU and causes regular sync points.

Instead, make a separate uniform buffer for each object you are going to render in the frame. In fact you should really create 2 per object and alternate between them each frame so that the GPU can be rendering the last frame whilst you are preparing the next frame on the CPU.

After you do that, you simply refactor your loop so the command buffer and render command work are done once per frame. Your loop should only consist of copying the uniform data, setting the vertex buffer and calling draw primitive.

Upvotes: 7

Related Questions