JoeTaicoon
JoeTaicoon

Reputation: 1567

C++ AMP fixed length recursion and tree traversal

Using C++ AMP I need to travel a quad tree with a maximal depth of 10. I was aware that C++ AMP does not support recursion, due to it not requiring the device to have a stack, but I was expecting it to be able to roll out a small limited recursion depth call tree, so it could still be inlined and handled without a stack.

Like

int recur(int i) restrict(amp) {
    if (i <= 1) return 1;
    else if (i > 5) return 1000; //just dummy code to limit the depth
    else return i + recur(i - 1);
}

No suck luck, it seems. Is there really no exceptions to the rule of no recursion?

In relation to this question, I wonder what is the common way to travel a tree using C++ AMP. Obviously I can make my own per-thread stack and then push current node before going deeper, and pop on the way back up, or I can extend my tree to hold "pointers" back up the tree and not only downwards, but...

being unable to find any examples showing C++ AMP tree traversal I wonder what a best practice here would be?

Edit: I am doing this for neighbor finding in a 2D domain. I have n randomly positioned points and need to find the nearest neighbors for each point. The points are in a single 1D array ordered by a space filling curve ( https://en.wikipedia.org/wiki/Z-order_curve ), so points which are close in the 1D array tend to be close in 2D as well, which in turn means that they tend to share many of the same neighbors as well.

Upvotes: 0

Views: 239

Answers (2)

Pruyque
Pruyque

Reputation: 382

If you know a maximum recursion depth, you could use templates to let the compiler roll out your recursion up to the maximum depth:

template <int I>
int recur(int i) restrict(amp) {
    if (i <= 1) return 1;
    else if (i > 5) return 1000; //just dummy code to limit the depth
    else return i + recur<I-1>(i - 1);
}

template<>
int recur<0>(int i) restrict(amp)
{ // maximum depth reached...
    return 1000;
}   

void f() restrict(amp)
{
    recur<10 /* maximum depth */>(3);
}

Upvotes: 0

Quxflux
Quxflux

Reputation: 3303

Although this question is outdated, I want to give some advice which applies not only to C++ AMP but to all GPGPU applications trying to perform nearest neighbour queries.

There are several ways to circumvent the usage of a stack in the GPU kernel. One approach which I was using (in OpenCL) is the utilisation of so called history-flags. The approach is described in a GDC presentation called Parallelizing the Physics Pipeline.
The idea is to store 4 bits per level of the tree. Each 4 bits encode unambiguously which node needs to be visited next. Unfortunately the needed size for the history-flags is too large to fit in thread-private memory, you would have to place them in tile_static memory where you have to watch out for bank conflicts.

Performing a radius search in my experience comes with some problems which are not to be solved efficiently on a GPU. First, since you are performing a radius search, you don't know how many neighbours there will be for a single point, but you will have to provide space to save the corresponding points beforehand. In the end you will end up with an kNN-search with a limited search-radius. Depending on the problem you are trying to solve, this might already be a deal breaker.
Since you need to find the closest points, you somehow have to sort the nearest neighbour candidates which ends up in some sort of heap structure. At this point thread-divergence will diminish any performance gain (except the points are evenly distributed).

Upvotes: 0

Related Questions