karyon
karyon

Reputation: 2547

Why does adding a color attachment cause a 3x slowdown?

i was rendering some models depth-only like this:

m_fbo = new globjects::Framebuffer();
depthBuffer = globjects::Texture::createDefault();
depthBuffer->storage2D(1, GL_DEPTH_COMPONENT32F, size, size);
m_fbo->attachTexture(GL_DEPTH_ATTACHMENT, depthBuffer);

m_fbo->bind();
... draw all the things

now, when i added a color attachment to it like this:

m_fbo = new globjects::Framebuffer();
depthBuffer = globjects::Texture::createDefault();
depthBuffer->storage2D(1, GL_DEPTH_COMPONENT32F, size, size);
m_fbo->attachTexture(GL_DEPTH_ATTACHMENT, depthBuffer);

attributeBuffer = globjects::Texture::createDefault();
attributeBuffer->storage2D(1, <format>, size, size);
m_fbo->attachTexture(GL_COLOR_ATTACHMENT0, attributeBuffer);

m_fbo->bind();
... draw all the things

depending on the format of the attribute buffer, rendering time went from 2.6ms to 5ms (R8, RG8), 8.5ms (RGB8, RGBA8, R32F) or 14.5ms (RG32F, RGBA32F) (measured with opengl timer queries).

I didn't even change the fragment shader, so i don't calculate any additional values to write into that color buffer. rendering time goes down again if i comment out that attachTexture line.

the texture at hand is a 2Kx2K shadowmap atlas. the program i use tessellates the models, converts each triangle to a point and renders that point with gl_PointSize = 1 into one randomly chosen 64x64 tile inside that atlas. tesselation and geometry shader are quite heavy so i don't think this is this is bandwidth or fillrate bound. this slowdown is much smaller (1.9 to 2.1ms) if i render into one large shadowmap instead of multiple small ones.

if i manually write the attributes into a texture with imageStore in the geometry shader and don't use a color attachment, the slowdown is reasonable as well (1.9 to 2.3ms)

also, this slowdown mysteriously disappears when i start tracing with nsight, which makes it impossible to profile this.

any ideas why this might happen?

I'm using a 750 Ti.

Upvotes: 1

Views: 599

Answers (1)

Tara
Tara

Reputation: 1796

Like most OpenGL performance issues, they are implementation dependent. Therefore we can only guess, unless we know how the actual implementation works.

  1. GPUs are usually optimized for depth-only rendering. Since you're adding a color attachment, you're not doing depth-only rendering anymore.
  2. Your color attachment format is GL_R32F. This format is most likely slower to render to than a regular old GL_RGBA8 format.
  3. If you change render targets every time you switch to a randomly chosen 64x64 shadowmap, this is really slow. Changing a render target is a very costly operation, but there are some ways around it. See page 29 of this presentation: http://http.download.nvidia.com/developer/presentations/2005/GDC/OpenGL_Day/OpenGL_FrameBuffer_Object.pdf
  4. If I understood you correctly, you're rendering lots of 1 pixel triangles. This is very slow. This is because GPUs rasterize pixels in groups of 2x2 pixels. Even if only one pixel gets rendered, the hardware will still run the shader 4 times and then just discard 3 pixels. If all you're rendering are 1 pixel triangles, then you're effectively wasting a 3/4 of your rasterization performance.

May I ask why you're using the depth attachment format GL_DEPTH_COMPONENT32F? Most GPUs don't even support 32 bit depth buffers. Usually 24 bits. Did you try using GL_DEPTH_COMPONENT24 or GL_DEPTH_COMPONENT32 instead?

This is a very peculiar problem though. Did you try updating your GPU drivers?

Upvotes: 1

Related Questions