Reputation: 1759
I'm using OpenCL (via JOCL) to find minima in a bunch of distance calculations for ray marching. The pseudocode would look something like this:
Start with a point in 3d space.
There are a number of functions to calculate distances
to that point from various other points.
These may be rather complex (transforms, csg etc).
Calculate all of the distances, perhaps into an array
Get the index of the minimum distance in the array..
Use that index to do up other stuff (pigmentation etc).
My implementation is kinda crap though. I don't currently parallelize the distance calculations but I would like to. Here's why I don't:
It's easy enough to get the minimum distance, but to retrieve this index is not obvious. I ended up iterating over the distances and keeping track of the current minimum and its index, but this is obviously garbage in a parallel environment.
Could basically use a tip to steer me in the right direction here, or tell me if Im barking up the wrong tree entirely? (e.g. is this a CPU job?)
Thanks!
Upvotes: 1
Views: 1292
Reputation: 11926
Tested with a RX550 which is a low end graphics card.
1-million element min() function:
__kernel void test(__global float * data,__global int * index)
{
int id=get_global_id(0);
float d1=data[id];
float d2=data[id+get_global_size(0)];
float f=fmin(d1,d2);
index[id]=select( index[id+get_global_size(0)], index[id], fabs(f-d1)<=fabs(f-d2) );
data[id]=f;
}");
initialized data elements with random values and index elements with indices of their own.
Uploading data and index to GPU through pci-e 2.0 8x took: 3.0 ms
computing with global range=512k,256k,128k,...,1 (logN steps) took: 0.3 ms
downloading data[0] and index[0] took: 0.002 ms
This is a straightforward version which may not be the fastest implementation. To get faster, workgroup level sub-reduction can be added with:
to reduce number of kernel enqueue commands to finish job in less than hundred ? microseconds.
Upvotes: 0