Reputation: 21
So i currently am trying out some stuff in SDL_GPU/C++ and i have the following setup, the images are 32 by 32 pixels respectively and the second image is transparent.
//..sdl init..//
GPU_Image* image = GPU_LoadImage("path");
GPU_Image* image2 = GPU_LoadImage("otherpath");
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image, NULL, screen, j, i);
GPU_Blit(image2, NULL, screen, j, i);
}
}
This codes with a WQHD sized screen has ~20FPS. When i do the following however
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image, NULL, screen, j, i);
}
}
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image2, NULL, screen, j, i);
}
}
i.e. seperate the two blitt calls in two differenct for loops i get 300FPS.
Can someone try to explain this to me or has any idea what might be going on here?
Upvotes: 2
Views: 573
Reputation: 63
While both examples render the same number of textures, the first one forces the GPU to make hundreds/thousands (depends on screen size) texture binds while the second makes only 2 texture binds.
The cost of rendering a texture is very cheap on modern GPUs while texture binds (switching to use another texture) are quite expensive.
Note that you can use texture atlas to alleviate the texture bind bottleneck while retaining the desired render order.
Upvotes: 0
Reputation: 13269
While cache locality might have an impact, I don't think it is the main issue here, especially considering the drop of frame time from 50ms to 3.3ms.
The call of interest is of course GPU_Blit
, which is defined here as making some checks followed by a call to _gpu_current_renderer->impl->Blit
. This Blit
function seems to refer to the same one, regardless of the renderer. It's defined here.
A lot of code in there makes use of the image parameter, but two functions in particular, prepareToRenderImage
and bindTexture
, call FlushBlitBuffer
several times if you are not rendering the same thing as in the previous blit. That looks to me like an expensive operation. I haven't used SDL_gpu before, so I can't guarantee anything, but it necessarily makes more glDraw*
calls if you render something other than what you rendered previously, than if you render the same thing again and again. And glDraw*
calls are usually the most expensive API calls in an OpenGL application.
It's relatively well known in 3D graphics that making as few changes to the context (in this case, the image to blit) as possible can improve performance, simply because it makes better use of the bandwidth between CPU and GPU. A typical example is grouping together all the rendering that uses some particular set of textures (e.g. materials). In your case, it's grouping all the rendering of one image, and then of the other image.
Upvotes: 3