Reputation: 1821
I have a list (with lets say 600) of simple polygons (i.e. no self-intersections) with at least 4 and at max 3000 points. I want to compute the convex hulls for each of these polygons on the GPU. That is, I need 600 convex hulls in the end.
Melkman's convex hull algorithm is a popular choice for simple polygons with a runtime of O(n). However, this algorithm is not parallel. Of course, I could compute the convex hull of all polygons in parallel with one thread for each input polygon. But I am looking for something that is parallel for each polygon.
In the literature, I did not find anything suitable. Most papers deal with arbitrary point clouds as input, where the filtering step is done in parallel on the GPU. The result of the filtering is then a simple polygon which is used as input for Melkman's algo. But since I already start with a simple polygon for which I need the convex hull, this does not help.
Is there parallel convex hull algo for simple polygons?
Upvotes: 2
Views: 228
Reputation: 4191
There is a paper (see below), where its author proposes an algorithm, which you are looking for:
However, this paper is old. Fortunately, there is another (and more recent) paper, where the Wagener algorithm is discussed together with its implementation in CUDA:
I didn't play with this algorithm myself, so this answer is essentially only a couple of links - however it might be a start for you.
Upvotes: 3
Reputation: 11920
Why not use physics? Assume all points as atoms. Each atom exerts a force to every other or even only within a close distance neighborhood. Convex hull atoms (surface atoms) would have the highest force acting on them because they have no opposing atoms on outside to neutralize the inner force. So, one can compute the forces using FFT (O(nlogn)) based convolution and sort the atoms on their net force vector length (O(nlogn)). Then start selecting them until one is found inside the selected atoms found (O(h^2) where h is number of convex hull points).
3000 points (3 dimensions, 32bit each) takes 36kB data and fits inside shared memory inside a SM unit of CUDA architecture. So a polygon can be computed by a single block of threads (64-1024 threads).
For chaotic number of work per task (polygon), a parent kernel can be launched, then it can spawn child kernels with variable amount of threads depending on selected polygon and then compute them with that optimal number of threads. This way, number of threads wasted is decreased. Dynamic parallelism reduces the launch overhead (instead of launching child kernels from host).
Upvotes: 2