Reputation: 21

SDL GPU Why is blitting two images in two seperates for loops way faster?

So i currently am trying out some stuff in SDL_GPU/C++ and i have the following setup, the images are 32 by 32 pixels respectively and the second image is transparent.

    //..sdl init..//
    GPU_Image* image = GPU_LoadImage("path");
    GPU_Image* image2 = GPU_LoadImage("otherpath");
    for (int i = 0; i < screenheight; i += 32) {
        for (int j = 0; j < screenwidth; j += 32) {
           GPU_Blit(image, NULL, screen, j, i);
           GPU_Blit(image2, NULL, screen, j, i);

        }
    }

This codes with a WQHD sized screen has ~20FPS. When i do the following however

   for (int i = 0; i < screenheight; i += 32) {
        for (int j = 0; j < screenwidth; j += 32) {
            GPU_Blit(image, NULL, screen, j, i);
        }
    }

    for (int i = 0; i < screenheight; i += 32) {
        for (int j = 0; j < screenwidth; j += 32) {
            GPU_Blit(image2, NULL, screen, j, i);
        }
    }

i.e. seperate the two blitt calls in two differenct for loops i get 300FPS.

Can someone try to explain this to me or has any idea what might be going on here?

Upvotes: 2

Answers (2)

Chris

Reputation: 63

While both examples render the same number of textures, the first one forces the GPU to make hundreds/thousands (depends on screen size) texture binds while the second makes only 2 texture binds.

The cost of rendering a texture is very cheap on modern GPUs while texture binds (switching to use another texture) are quite expensive.

Note that you can use texture atlas to alleviate the texture bind bottleneck while retaining the desired render order.

Upvotes: 0

Nelfeal

Reputation: 13269

While cache locality might have an impact, I don't think it is the main issue here, especially considering the drop of frame time from 50ms to 3.3ms.

The call of interest is of course GPU_Blit, which is defined here as making some checks followed by a call to _gpu_current_renderer->impl->Blit. This Blit function seems to refer to the same one, regardless of the renderer. It's defined here.

A lot of code in there makes use of the image parameter, but two functions in particular, prepareToRenderImage and bindTexture, call FlushBlitBuffer several times if you are not rendering the same thing as in the previous blit. That looks to me like an expensive operation. I haven't used SDL_gpu before, so I can't guarantee anything, but it necessarily makes more glDraw* calls if you render something other than what you rendered previously, than if you render the same thing again and again. And glDraw* calls are usually the most expensive API calls in an OpenGL application.

It's relatively well known in 3D graphics that making as few changes to the context (in this case, the image to blit) as possible can improve performance, simply because it makes better use of the bandwidth between CPU and GPU. A typical example is grouping together all the rendering that uses some particular set of textures (e.g. materials). In your case, it's grouping all the rendering of one image, and then of the other image.

Upvotes: 3

SDL GPU Why is blitting two images in two seperates for loops way faster?

Answers (2)

Related Questions