utahwithak
utahwithak

Reputation: 6335

Abysmal Metal rendering speed

I have been trying to speed up rendering in my game but only managed to make it slower and I'm stumped as to how that happened. I'm drawing some terrain in a 2d game so imagine long strips across the screen of various textures.

The prior implementation would collate the various textures and iterate through those and draw triangles associated with the texture in a disparate manner, this as you can imagine resulted in a ton more draw calls and forced switching some of the uniforms at multiple times throughout the process if we were wrapping around the edge.

So now what I did was pack all the textures together so I wouldn't have to change them out and could draw large swaths of triangles at once. This lowered the draw count from close to around 4000 to less than a thousand (as measured by the capture frame tool in Xcode) but slowed the frame down immensely. So the same scene is being rendered with the same number of triangles yet the fps went from around 40 to 10.

Before it was more than a thousand of simple drawPrimitives(.Triangle ... with only 5 or 6 triangles at once now it is only a few hundred calls drawing over 100 triangles each time.

for example [drawPrimitives:3 vertexStart:1944 vertexCount:348 instanceCount:116] which is reporting that it is taking 1.93 ms to do that.

on a frame capture it shows these calls drawing 100 or so take 2+ms! Why so long!! Which strikes me as super odd as it can do a quad in roughly 3 microseconds so if it scaled I would imagine this strip would only take .2ms

The shaders are the same between the two implementations and are super simple at that. I profiled the code that prepares the triangles and it is faster. The only thing I can point to is the actual drawPrimitives calls but can't figure out why it is bogging down now.

So why is 4 times less draw calls resulting it 3 times longer frame rate? What am I missing as to why this is going so slow! I'm more efficient!!!

Edit Here is the shader code:

vertex TerrainFragmentIn terrainVertex(const device TerrainVertex* verts [[ buffer(0) ]],
                            uint v_id [[ vertex_id ]],
                            constant Constants &mvp [[buffer(1)]],
                            constant ModelMatrix &modelMat [[buffer(2)]]
                            ) {
    TerrainVertex vert = verts[v_id];

    TerrainFragmentIn outVertex;
    outVertex.position = mvp.viewProjectionMatrix * modelMat.modelMatrix * float4(vert.position.x,vert.position.y,0,1);
    outVertex.shadow  = vert.shadow;
    outVertex.uv = vert.tex;
    return outVertex;
}

fragment float4 terrainFragment(TerrainFragmentIn inFrag [[stage_in]],
                            texture2d<float, access::sample> colorTexture [[ texture(0) ]],
                            sampler colorSampler [[ sampler(0) ]]) {
    float4 color = colorTexture.sample(colorSampler, inFrag.uv);
    color *= float4(inFrag.shadow,inFrag.shadow,inFrag.shadow,1.0);

    return color * 2;
}

Those structs are defined as such and are identical on the swift side:

struct TerrainVertex {
    float2 position [[ attribute(0) ]];
    float2 tex      [[ attribute(1) ]];
    float shadow    [[ attribute(2) ]];
};

struct Constants {
    float4x4 viewProjectionMatrix;
};

struct ModelMatrix {
    float4x4 modelMatrix;
};

There is nearly no state change between 90% the calls DrawCalls

Each triangle is around 20x30 pixels. On these calls I'm not doing any blending. I actually wish I was getting diagnostic or warnings/ errors on the capture but no, sadly nothing. I'm testing on a rMPB running 10.12.3. I haven't tried on iOS yet.

Upvotes: 1

Views: 512

Answers (1)

Columbo
Columbo

Reputation: 6776

In your screenshot it's showing the instance count is 120.

In your calls to drawPrimitives, are you setting the instanceCount parameter as the primitive count (it looks like it). If you are, then each draw call is rendering 120 * 120 = 14400 triangles, which would explain why consolidating draw calls is worsening the framerate for you as you're drawing triangles-squared primitives each time.

If you're not using instancing (and your shaders suggest you are not), then you should set instanceCount to 1.

Upvotes: 2

Related Questions