damian
damian

Reputation: 3674

graphics: best performance with floating point accumulation images

I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.

I did some very unscientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:

Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?

Upvotes: 2

Views: 1740

Answers (4)

Crashworks
Crashworks

Reputation: 41384

It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.

Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.

Upvotes: 2

Nils Pipenbrinck
Nils Pipenbrinck

Reputation: 86343

The problem is simply the sheer amount of data you have to process.

Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:

  • Clear the buffer
  • Render something on it (uses reads and writes)
  • Convert to unsigned bytes
  • Upload to OpenGL

That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.

If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:

  • Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).

  • Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.

  • Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.

The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.

Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.

Upvotes: 4

unwind
unwind

Reputation: 399813

If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.

Upvotes: 1

Aaron Digulla
Aaron Digulla

Reputation: 328594

Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).

Upvotes: 1

Related Questions