Reputation: 4727
I have a compute shader that culls object triangles against frustums.
For culling, I use a huge vertex and index buffer and a pair of (offset, count) to identify the range of vertices for a single object.
When I dispatch the compute shader, I specify the number of objects in the X size of the Dispatch group.
The shader itself takes the index buffer, looks up 3 triangle indices, takes the AABB of the triangle (by reading the world positions of the triangle indices in the vertex buffer) and tests them.
The indices that pass the culling are written into a separate, new buffer into the location original_offset+n
where n
is a local variable containing the new index.
The shader culls one object (Dispatch N) with Y triangles in a single compute thread (numthread(1,1,1))
I want to cull triangles in parallel within a single Dispatch Group. But I am not sure how to write the new indices.
Here is an illustration of what I want to achieve:
I do not know how many triangles thread n
produces so I do not know where thread n+1
should start writing to.
How can I achieve something like this?
My shader looks like this (details left out)
[numthreads(1,1,1)]
void main(uint3 group_index : SV_GroupID){
uint node_id = group_index.x;
uint room = room_ids[node_id];
room_vertex_range range = room_ranges[room];
int next_culled_index = 0;
const uint vertex_stride = 40;
for(uint idx = 0; idx < range.count ; idx+=3) { // load every triangle
uint vertex0_id = room_index_buffer[idx + range.offset];
uint vertex1_id = room_index_buffer[idx + 1 + range.offset];
uint vertex2_id = room_index_buffer[idx + 2 + range.offset];
uint vertex0_offset = vertex0_id * vertex_stride;
uint vertex1_offset = vertex1_id * vertex_stride;
uint vertex2_offset = vertex2_id * vertex_stride;
float3 pos0 = asint(room_vertex_buffer.Load3(vertex0_offset));
float3 pos1 = asint(room_vertex_buffer.Load3(vertex1_offset));
float3 pos2 = asint(room_vertex_buffer.Load3(vertex2_offset));
if(test_triangle(frustum, pos0,pos1,pos2)) { // write passed indices into new index buffer
culled_room_index_buffer[range.offset + next_culled_index] = vertex0_id;
culled_room_index_buffer[range.offset + next_culled_index+1] = vertex1_id;
culled_room_index_buffer[range.offset + next_culled_index+2] = vertex2_id;
next_culled_index+= 3;
}
}
room_draw_args.Store(group_index.x * 20, next_culled_index); // indirect args index count
room_draw_args.Store(group_index.x * 20+4, 1);
room_draw_args.Store(group_index.x * 20+8, range.offset);
room_draw_args.Store(group_index.x * 20+12, 0);
room_draw_args.Store(group_index.x * 20+16, 0);
}
Upvotes: 0
Views: 20