Milcho
Milcho

Reputation: 339

Opengl - "fullscreen" texture rendering performance issue

I am writing a 2D game in openGL and I ran into some performance problems while rendering a couple of textures covering the whole window.

What I do is actually create a texture with the size of the screen, render my scene onto that texture using FBO and then render the texture a couple of times with a different offsets to get a kind of "shadow" going. But when I do that I get a massive performance drop while using my integrated video card.

So all in all I render 7 quads onto the whole screen(background image, 5 "shadow images" with a black "tint" and the same texture with its true colors). I am using RGBA textures which are 1024x1024 in size and fit them in a 900x700 window. I am getting 200 FPS when I am not rendering the textures and 34 FPS when I do (in both scenarios I actually create the texture and render the scene onto it). I find this quite odd because I am only rendering 7 quads essentially. A strange thing is also that when I run a CPU profiler it doesn't suggest that this is the bottleneck(I know that opengl uses a pipeline architecture and this thing can happen but most of the times it doesn't).

When I use my external video card I get consistent 200 FPS when I do the tests above. But when I disable the scene rendering onto the texture and disable the texture rendering onto the screen I get ~1000 FPS. This happens only to my external video card - when I disable the FBO using the integrated one I get the same 200 FPS. This really confuses me.

Can anyone explain what's going on and if the above numbers sound right?

Integrated video card - Intel HD Graphics 4000

External video card - NVIDIA GeForce GTX 660M

P.S. I am writing my game in C# - so I use OpenTK if that is of any help.

Edit:

First of all thanks for all of the responses - they were all very helpful in a way, but unfortunately I think there is just a little bit more to it than just "simplify/optimize your code". Let me share some of my rendering code:

//fields defined when the program is initialized

Rectangle viewport;
//Texture with the size of the viewport
Texture fboTexture;
FBO fbo;

//called every frame
public void Render()
{
    //bind the texture to the fbo
    GL.BindFramebuffer(FramebufferTarget.Framebuffer, fbo.handle);
    GL.FramebufferTexture2D(FramebufferTarget.Framebuffer, fboTexture,
       TextureTarget.Texture2D, texture.TextureID, level: 0);

    //Begin rendering in Ortho 2D space
    GL.MatrixMode(MatrixMode.Projection);
    GL.PushMatrix();
    GL.LoadIdentity();
    GL.Ortho(viewport.Left, viewport.Right, viewport.Top, viewport.Bottom, -1.0, 1.0);
    GL.MatrixMode(MatrixMode.Modelview);
    GL.PushMatrix();
    GL.LoadIdentity();

    GL.PushAttrib(AttribMask.ViewportBit);
    GL.Viewport(viewport);

    //Render the scene - this is really simple I render some quads using shaders
    RenderScene();

    //Back to Perspective
    GL.PopAttrib(); // pop viewport
    GL.MatrixMode(MatrixMode.Projection);
    GL.PopMatrix();
    GL.MatrixMode(MatrixMode.Modelview);
    GL.PopMatrix();

    //Detach the texture
    GL.FramebufferTexture2D(FramebufferTarget.Framebuffer, fboTexture, 0,
                    0, level: 0);
    //Unbind the fbo
    GL.BindFramebuffer(FramebufferTarget.Framebuffer, 0);

    GL.PushMatrix();
    GL.Color4(Color.Black.WithAlpha(128)); //Sets the color to (0,0,0,128) in a RGBA format

    for (int i = 0; i < 5; i++)
    {
        GL.Translate(-1, -1, 0);
        //Simple Draw method which binds the texture and draws a quad at (0;0) with
        //its size
        fboTexture.Draw();
    }
    GL.PopMatrix();
    GL.Color4(Color.White);
    fboTexture.Draw();
}

So I don't think there is actually anything wrong with the fbo and rendering onto the texture, because this is not causing the program to slow down on both of my cards. Previously I was initializing the fbo every frame and that might have been the reason for my Nvidia card to slow down, but now when I am pre-initializing everything I get the same FPS both with and without fbo.

I think the problem is not with the textures in general because if I disable textures and just render the untextured quads I get the same result. And still I think that my integrated card should run faster than 40 FPS when rendering only 7 quads on the screen, even if they cover the whole of it.

Can you give me some tips on how can I actually profile this and post back the result? That would be really useful.

Edit 2:

Ok I experimented a bit and managed to get much better performance. First I tried rendering the final quads with a shader - this didn't have any impact on performance as I expected.

Then I tried to run a profiler. But I far as I know SlimTune is just a CPU profiler and it didn't give me the results I wanted. Then I tried gDEBugger. It has an integration with visual studio which I later found out that it does not support .NET projects. I tried running the external version but it didn't seem to work (but maybe I just haven't played with it enough).

The thing that really did the trick was that rather than rendering the 7 quads directly to the screen I first render them on a texture, again using fbo, and then render the final texture once onto the screen. This got my fps from 40 to 120. Again this seem kind of curios to say the least. Why is rendering to a texture way faster than directly rendering to the screen? Nevertheless thanks for the help everyone - it seems that I have fixed my problem. I would really appreciate if someone come up with reasonable explanation of the situation.

Upvotes: 4

Views: 3623

Answers (3)

MisterMetaphor
MisterMetaphor

Reputation: 6018

It isn't just about number of quads you render, and I believe in your case it's got more to do with amout of triangle filling your video card has to do.

As was mentioned, the common way to do fullscreen post-processing is with shaders. If you want better performance on your integrated card and can't use shaders, then you should simplify your rendering routine.

Make sure you really need alpha blending. On some cards/drivers rendering textures with alpha channel can significantly reduce performance.

A somewhat low-quality way to reduce the amount of fullscreen filling would be to first perform all of your shadow draws on another, smaller texture (say, 256x256 instead of 1024x1024). Then you would draw a quad with that compound shadow texture onto your buffer. This way instead of 7 1024x1024 quads you would only need 6 256x256 and one 1024x1024. But you will lose in resolution.

Another technique, and I'm not sure it can be applied in your case, is to pre-render your complex background so you'll have to do less drawing in your rendering loop.

Upvotes: 1

Matt Kline
Matt Kline

Reputation: 10507

Obviously this is a guess since I haven't seen or profiled your code, but I would guess that integrated cards are just struggling with your post-processing (drawing the texture several times to achieve your "shadow" effect).

I don't know your level of familiarity with these concepts, so sorry if I'm a bit verbose here.

About Post-Processing

Post-processing is the process of taking your completed scene, rendered to a texture, and applying effects to the image before displaying it on the screen. Typical uses of post-processing include:

  • Bloom - Simulate brightness more naturally by "bleeding" bright pixels into neighboring darker ones.

  • High Dynamic Range rendering - Bloom's big brother. The scene is rendered to a floating-point texture, allowing greater color ranges (as opposed to the usual 0 for black and 1 for full brightness). The final colors displayed on the screen are calculated using the average luminance of all the pixels on the screen. The effect of all of this is that the camera acts somewhat like the human eye - in a dark room, a bright light (say, through a window) looks extremely bright, but once you get outside, the camera adjusts and light only looks that bright if you stare directly at the sun.

  • Cel-shading - Colors are modified to give a cartoon-like appearance.

  • Motion blur

  • Depth of field - The in-game camera approximates a real one (or your eyes), where only objects at a certain distance are in-focus and the rest are blurry.

  • Deferred shading - A fairly advanced application of post-processing where lighting is calculated after the scene has been rendered. This costs a lot of video RAM (it usually uses several fullscreen textures) but allows a large number of lights to be added to the scene quickly.

In short, you can use post-processing for a lot of neat tricks. Unfortunately...

Post Processing Has a Cost

The cool thing about post-processing is that its cost is independent of the scene's geometric complexity - it will take the same amount of time whether you drew a million triangles or whether you drew a dozen. That's also its drawback, however. Even though you're only rendering a quad over and over to do post-processing, there is a cost for rendering each pixel. If you were to use a larger texture, the cost would be larger.

A dedicated graphics card obviously has far more computing resources to apply post-processing, whereas an integrated card usually has much fewer resources it can apply. It is for this reason that "low" graphics settings on video games often disable many post-processing effects. This wouldn't show up as a bottleneck on a CPU profiler because the delay happens on the graphics card. The CPU is waiting for the graphics card to finish before continuing your program (or, more accurately, the CPU is running another program while it waits for the graphics card to finish).

How Can You Speed Things Up?

  • Use fewer passes. If you halve the passes, you halve the time it takes to do post-processing. To that end,

  • Use shaders. Since I didn't see you mention them anywhere, I'm not sure if you're using shaders for your post-processing. Shaders essentially allow you to write a function in a C-like language (since you're in OpenGL, you can use either GLSL or Cg) which is run on every rendered pixel of an object. They can take any parameters you like, and are extremely useful for post-processing. You set the quad to be drawn using your shader, and then you can insert whatever algorithm you'd like to be run on every pixel of your scene.

Upvotes: 7

Robert Rouhani
Robert Rouhani

Reputation: 14688

Seeing some code would be nice. If the only difference between the two is using an external GPU or not, the difference could be in memory management (ie how and when you're creating an FBO, etc.), since streaming data to the GPU can be slow. Try moving anything that creates any sort of OpenGL buffer or sends any sort of data to it to initialization. I can't really give any more detailed advice without seeing exactly what you're doing.

Upvotes: 1

Related Questions