Karim Bahgat
Karim Bahgat

Reputation: 3041

Coordinate container types in Python Aggdraw for fastest possible rendering?

Original Question:

I have a question about the Python Aggdraw module that I cannot find in the Aggdraw documentation. I'm using the ".polygon" command which renders a polygon on an image object and takes input coordinates as its argument.

My question is if anyone knows or has experience with what types of sequence containers the xy coordinates can be in (list, tuple, generator, itertools-generator, array, numpy-array, deque, etc), and most importantly which input type will help Aggdraw render the image in the fastest possible way?

The docs only mention that the polygon method takes: "A Python sequence (x, y, x, y, …)"

I'm thinking that Aggdraw is optimized for some sequence types more than others, and/or that some sequence types have to be converted first, and thus some types will be faster than others. So maybe someone knows these details about Aggdraw's inner workings, either in theory or from experience?

I have done some preliminary testing, and will do more soon, but I still want to know the theory behind why one option might be faster, because it might be that I not doing the tests properly or that there are some additional ways to optimize Aggdraw rendering that I didn't know about.

(Btw, this may seem like trivial optimization but not when the goal is to be able to render tens of thousands of polygons quickly and to be able to zoom in and out of them. So for this question I dont want suggestions for other rendering modules (from my testing Aggdraw appears to be one of the fastest anyway). I also know that there are other optmization bottlenecks like coordinate-to-pixel transformations etc, but for now Im only focusing on the final step of Aggdraw's internal rendering speed.)

Thanks a bunch, curious to see what knowledge and experience others out there have with Aggdraw.


A Winner? Some Preliminary Tests

I have now conducted some preliminary tests and reported the results in an Answer further down the page if you want the details. The main finding is that rounding float coordinates to pixel coordinates as integers and having them in arrays are the fastest way to make Aggdraw render an image or map, and lead to incredibly fast rendering speedups on the scale of 650% at speeds that can be compared with well-known and commonly used GIS software. What remains is to find fast ways to optimize coordinate transformations and shapefile loading, and these are daunting tasks indeed. For all the findings check out my Answer post further down the page.

I'm still interested to hear if you have done any tests of your own, or if you have other useful answers or comments. I'm still curious about the answers to the Bonus question if anyone knows.


Bonus question:

If you don't know the specific answer to this question it might still help if you know which programming language the actual Aggdraw rendering is done in? Ive read that the Aggdraw module is just a Python binding for the original C++ Anti-Grain Geometry library, but not entirely sure what that actually means. Does it mean that the Aggdraw Python commands are simply a way of accessing and activating the c++ library "behind the scenes" so that the actual rendering is done in C++ and at C++ speeds? If so then I would guess that C++ would have to convert the Python sequence to a C++ sequence, and the optimization would be to find out which Python sequence can be converted the fastest to a C++ sequence. Or is the Aggdraw module simply the original library rewritten in pure Python (and thus much slower than the C++ version)? If so which Python types does it support and which is faster for the type of rendering work it has to do. enter code here

Upvotes: 1

Views: 440

Answers (1)

Karim Bahgat
Karim Bahgat

Reputation: 3041

A Winner? Some Preliminary Tests

Here are the results from my initial testings of which input types are faster for aggdraw rendering. One clue was to be found in the aggdraw docs where it said that aggdraw.polygon() only takes "sequences": officially defined as "str, unicode, list, tuple, bytearray, buffer, xrange" (http://docs.python.org/2/library/stdtypes.html). Luckily however I found that there are also additional input types that aggdraw rendering accepts. After some testing I came up with a list of the input container types that I could find that aggdraw (and maybe also PIL) rendering supports:

  • tuples
  • lists
  • arrays
  • Numpy arrays
  • deques

Unfortunately, aggdraw does not support and results in errors when supplying coordinates contained in:

  • generators
  • itertool generators
  • sets
  • dictionaries

And then for the performance testing! The test polygons were a subset of 20 000 (multi)polygons from the Global Administrative Units Database of worldwide sub-national province boundaries, loaded into memory using the PyShp shapefile reader module (http://code.google.com/p/pyshp/). To ensure that the tests only measured aggdraw's internal rendering speed I made sure to start the timer only after the polygon coordinates were already transformed to aggdraw image pixel coordinates, AND after I had created a list of input arguments with the correct input type and aggdraw.Pen and .Brush objects. I then timed and ran the rendering using itertools.starmap with the preloaded coordinates and arguments:

t=time.time()
iterat = itertools.starmap(draw.polygon, args) #draw is the aggdraw.Draw() object
for runfunc in iterat: #iterating through the itertools generator consumes and runs it
    pass
print time.time()-t

My findings confirm the traditional notion that tuples and arrays are the fastest Python iterators, which both ended up being the fastest. Lists were about 50% slower, and so too were numpy arrays (this was initially surprising given the speed-reputation of Numpy arrays, but then I read that Numpy arrays are only fast when one uses the internal Numpy functions on them, and that for normal Python iteration they are generally slower than other types). Deques, usually considered to be fast, turned out to be the slowest (almost 100%, ie 2x slower).

### Coordinates as FLOATS
### Pure rendering time (seconds) for 20 000 polygons from the GADM dataset
tuples
8.90130587328
arrays
9.03419164657
lists
13.424952522
numpy
13.1880489246
deque
16.8887938784

In other words, if you usually use lists for aggdraw coordinates you should know that you can gain a 50% performance improvement by instead putting them into a tuple or array. Not the most radical improvement but still useful and easy to implement.

But wait! I did find another way to squeeze out more performance power from the aggdraw module--quite a lot actually. I forget why I did it but when I tried rounding the transformed floating point coordinates to the nearest pixel integer as integer type (ie "int(round(eachcoordinate))") before rendering them I got a 6.5x rendering speedup (650%) compared to the most common list container--a well-worth and also easy optimization. Surprisingly, the array container type turns out to be about 25% faster than tuples when the renderer doesnt have to worry about rounding numbers. This prerounding leads to no loss of visual details that I could see, because these floating points can only be assigned to one pixel anyway, and might be the reason why preconverting/prerounding the coordinates before sending them off to the aggdraw renderer speeds up the process bc then aggdraw doesnt have to. A potential caveat is that it could be that taking away the decimal information changes how aggdraw does its anti-aliasing but in my opinion the final map still looks equally anti-aliased and smooth. Finally, this rounding optimization must be weighed against the time it would take to round the numbers in Python, but from what I can see the time it takes to do prerounding does not outweigh the benefits of the rendering speedup. Further optimization should be explored for how to round and convert the coordinates in a fast way.

### Coordinates as INTEGERS (rounded to pixels)
### Pure rendering time (seconds) for 20 000 polygons from the GADM dataset
arrays
1.40970077294
tuples
2.19892537074
lists
6.70839555276
numpy
6.47806400659
deque
7.57472232757

In conclusion then: arrays and tuples are the fastest container types to use when providing aggdraw (and possibly also PIL?) with drawing coordinates.

Given the hefty rendering speeds that can be obtained when using the correct input type with aggdraw, it becomes particularly crucial and rewarding to find even the slightest optimizations for other aspects of the map rendering process, such as coordinate transformation routines (I am already exploring and finding for instance that Numpy is particularly fast for such purposes).

An more general finding from all of this is that Python can potentially be used for very fast map rendering applications and thus further opens the possibilities for Python geospatial scripting; e.g. the entire GADM dataset of 200 000+ provinces can theoretically be rendered in about 1.5*10=15 seconds without thinking about coordinate to image coordinate transformation, which is way faster than QGIS and even ArcGIS which in my experience struggles with displaying the GADM dataset.

All results were obtained on a 8-core processor, 2-year old Windows 7 machine, using Python 2.6.5. Whether these results are also the most efficient when it comes to loading and/or processing the data is a question that has to be tested and answered in another post. It would be interesting to hear if someone else already have any good insights on these aspects.

Upvotes: 1

Related Questions