How is Win32 Bitmap rendering faster than pixels?

Question

Win32 bitmaps are (a lot) faster to draw compared to SetPixelV or another function such as. How does this work, if at the end the computer will be drawing pixels for the bitmap?

Yakk - Adam Nevraumont · Accepted Answer

Suppose you have a pixel. This pixel has color components A B and C. The surface you are drawing to has color components X Y and Z.

So first you need to check if they match. If they don't match, costs go up. Assume they match.

Next, you need to do bounds checking -- did the caller give you something stupid? Some comparisons, additions and multiplications.

Next, you need to find where the pixel is. This is some of multiplications and additions.

Now, you have to access the source data and the destination data and write it.

If you are working a scanline at a time, almost all of that overhead above can be done once. You can calculate what part of the scanline falls in bounds or not with only a bit more overhead than doing one pixel. You can find where the scanline writes in the destination with again only a bit more overhead than one pixel. You can check color space conversions with the same overhead as one pixel.

The big difference is that instead of copying one pixel, you copy in a block.

As it happens, computer are really good at copying blocks of things. There are built-in instructions on some CPUs, some memory systems can do it without the CPU being involved (CPU says "copy X to Y", then can do other things; and memory-to-memory bandwidth might be higher than memory-to-CPU-to-memory). Even if you are round-tripping through the CPU, there are SIMD instructions that let you work on 2, 4, 8, 16 or even more units of data at the same time, so long as you work on them in the same way using a limited instruction set.

In some cases, you can even offload work to the GPU -- if both source and destination scanline are on the GPU, you can say "yo GPU, you handle it", and the GPU is even more specialized for doing that kind of task.

The very first bit of optimization -- only having to do checks once per scanline instead of once per pixel -- can easily give you a 2x to ~10x speedup. The second -- more efficient blitting -- another 4x to ~20x faster. Doing everything on the GPU can be ~2x to 100x faster.

The final thing is the overhead of actually calling the function. Usually this is minor; but when calling SetPixel 1 million times (a 1000 x 1000 image, or a modest sized screen) it adds up.

For an HD display with 2 million pixels, 60 times per second is 120 million pixels manipulated per second. A single threaded program on a 3 GHz machine only has room to run ~25 instructions per pixel if you want to keep up with the screen, and that assumes nothing else happens (which is unlikely). On a 4k monitor you are down to 6 instructions per pixel.

With that many pixels being played with, shaving off every instruction you can makes a big difference.

Multipliers pulled out of nowhere. I've written some conversion of per-pixel operations to per-scanline operations that have shown impressive speedups, however, and ditto for CPU to GPU loads, and have seen SIMD give impressive speedups.

How is Win32 Bitmap rendering faster than pixels?

Answers (2)

Related Questions