Why is drawing POT images faster than NPOT images?

Question

I was looking into canvas speed optimizations, and I found this answer: https://stackoverflow.com/a/7682200/999400

don't use images with odd widths. always use widths as powers of 2.

So I'm wondering, why is this faster?

I have seen posts that explain that this helps with old graphics cards (when using OpenGL & such), but I'm talking about speed, not compatability, and about canvas, not OpenGL/WebGL.

enhzflep · Accepted Answer

It's faster because you can use the << operator rather than the * oprator. I.e It's faster to perform a 'shift left by 1' (multiply by two) than it is to perform a 'muliply by 43'. One can get around this limitation by adding padding bytes to the end of each row of the image (as MS did for in memory bitmaps), but essentially, it's a consequence of the speed difference between the two instructions.

In the old days of 8bit 320x200 (mode 13h), you could index a pixel with the simple formula:

pixOffset = xPos + yPos * 320;

But this was slooow. A much better alternative was to use

C

pixOffset = xPos + (yPos * 256) + (yPos * 64)

Asm

mov ax, xPos    ;   ax = xPos
mov bx, yPos    ;   bx = yPos
shl bx, 6       ;   bx = yPos * 64
add ax, bx      ;   ax = xPos + (yPos * 64)
shl bx, 2       ;   bx = yPos * 256
add ax, bx      ;   ax = xPos + yPos * 320

This may seem counter-intuitive, but when well written, it only uses single clock instructions. I.e you could calculate the offset in 6 clock cycles. Of course, pipelining and cache misses complicate the scenario.

It's also far, far cheaper to implement shift registers in hardware than a full multiplication unit, both in $$ and transistors. Consequently, the same number of transistors can be used to provide better performance, or fewer can be be used for the same performance at lower power dissipation.

AFAIK, the mul (and div) instructions of modern processors are implemented with the help of look-up tables. This has for the most part, mitigated the problem, but it isn't without it's problems either. For further reading, look into the Pentium fdiv bug (a look-up table was wrongly populated inside the chips)

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

So in closing, it's essentially an artefact of the hardware/software used to implement the functionality.

Why is drawing POT images faster than NPOT images?

Answers (1)

Related Questions