Reputation: 1619
We are working on an assignment for a GPGPU course. We picked an algorithm, implemented it on the CPU and are now converting it to OpenCL.
The algorithm we've chosen loads a model as a set of triangles and rasterizes them to voxels. The voxels are defined as a VBO of point data. We then use a geometry shader to convert these points to voxels as triangles.
So our OpenCL program needs to take a list of triangles and output a variable list of points.
And outputting a variable length array seems to be a problem.
The solution we found is to atomically increment a counter and use that counter as both an index into the output array and a final size of the array. Except... both our GPU's don't support the extension for atomic operations.
This is what we have so far:
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable
#define POS1 i0 * 3 + 0
#define POS2 i0 * 3 + 1
#define POS3 i0 * 3 + 2
void WritePosition( __global float* OutBuffer, uint inIndex, __global float* inPosition )
{
OutBuffer[ inIndex * 3 ] = inPosition[0];
OutBuffer[ inIndex * 3 + 1] = inPosition[1];
OutBuffer[ inIndex * 3 + 2] = inPosition[2];
}
__kernel void Voxelize(
__global float* outPointcloudBuffer,
__global float* inTriangleBuffer,
__global uint* inoutIndex
)
{
size_t i0 = get_global_id(0);
size_t i1 = get_local_id(0);
WritePosition( outPointcloudBuffer, inIndex[0], &inTriangleBuffer[ i0 ] );
//atomic_inc(inoutIndex[0]);
inoutIndex[0] = max(inoutIndex[0], i0);
}
And the output of this is very odd. We're testing a very small model (12 triangles, 36 positions, 108 floats) and the result we get is either 31, 63 or 95. Always a multiple of 16 minus 1.
How can we get the length of our variable length output array?
Thanks in advance.
Upvotes: 2
Views: 2088
Reputation: 725
I would guess that this is normally tackled as follows:
You might want to have a look at NVIDIA's OpenCL marching cubes implementation where all three passes mentioned above are implemented.
Best, Christoph
Upvotes: 5