Reputation: 6240
I have a set of very efficient horizontal and vertical line drawing functions which can draw many pixels per cycle (for the horizontal one, ~4 pixels/cycle, and the vertical one ~0.25 pixels/cycle.) I'm using Bresenham's line drawing algorithm to draw arbitrary lines. However, this involves calling the single draw pixel routine, which is relatively slow (~0.03 pixels/cycle.)
I've noticed most of the lines drawn by Bresenham's algorithm show horizontal and vertical strips with a good distance between them. Does anyone know if it would be possible to replace these individual calls to DrawPixel with calls to the DrawHoriz and DrawVert drawing routines? Does anyone have a code example? I've tried already, but I've had limited success, mostly resulting in broken output - I suspect I'm approaching it the wrong way.
Many thanks
Upvotes: 1
Views: 1937
Reputation: 3830
The following applies to all code.
The number 4 is relevant because it's mine, or rather it's shown to me by multiple CPUs and rendering methods, arrived at from too much profiling in 1994 on several platforms.
The first optimization is to try to move as much as you can out of a loop. This is Bresenham.
Then, you try repetition, clustering, shortcuts, and much more.
A loop with its remains still functioning still requires initialization, loop instructions, and exit condition check. This makes 3.
If you then found a way to beat the loop instructions, you still require detection and switching to the special-case. This makes 4.
If you have many occurrences, and can load them in a sorted way (instead of adding the cost of sorting them), you can get the switch close to 0 cost.
Within the octant symmetry, just as many slopes are close to horizontal or vertical as they are to 45 degrees, but very few of the close to horizontal or vertical ones are more than 4 pixels per step.
I.e., 4+ horizontal or 4+ vertical slopes will be rare. You can see that this is the case with just 16x16 grid paper or larger, and count the cases within one octant.
This puts restrictions on how much special-cases improve measured execution time, and you should instead use my number 4 as your guide when writing code. As in, never assume your code will be called once, but always 4+, and you write optimized code.
If you have CPU or GPU parallelization available, you can set up your loading in the same way, and with the same single switch to the rest when the core, pipeline, etc is out of data.
Parallelization of code left within the Bresenham loop is difficult, obviously a tree of if-else, which can be written with minimal branch prediction penalty on ARM and other CPUs, and CPU-internal optimizations. It can and has been implemented in hardware.
Today, if aliased line drawing is desired, it's best introduced into the rendering pipeline of Vulkan as a form of drawing all the lines (as the entire frame is rendered, with all the other types of geometry you want displayed).
Upvotes: 0
Reputation: 81704
The Bresenham algorithm is normally implemented as a set of four special cases, depending on the sign and magnitude of the line's slope (steep positive, steep negative, shallow positive, shallow negative.) In the steep cases, you end up drawing a bunch of vertical line segments, each shifted by one horizontal pixel; for the shallow ones, it's horizontal lines.
As you compute the line coordinates, you step the faster moving one (y for the steep cases, x for the shallow ones) and you calculate the cumulative error in the other coordinate; you change it to the next value when the cumulative error gets to 1. So what if you ran a normal Bresenham algorithm, computing all the pixels, but then instead of drawing each pixel, you just drew the line for a given "fast" coordinate from the beginning to end of the "slow" coordinate -- if you know what I mean. Instead of drawing at each cycle, in other words, you just drew when the slow coordinate changed, right before you bump it to the next value.
Yes, I'm sure this would work. Whether it would be any faster, I have no idea, but it would definitely work.
Upvotes: 2