Reputation: 267
Implementing GPU-based particle system using the algorithm in this paper: http://www.gamasutra.com/view/feature/130535/building_a_millionparticle_system.php?print=1
There are two things i can't understand:
.
Upvotes: 1
Views: 615
Reputation: 2514
The term "particle is dead" is a term, that only describes the semantical dead of a particle. From the GPU's point of view, all particles are live at all time and all particles will be calculated in each frame. (Or at least, particles 0 to N will be processed, but even the dead ones).
Once the CPU "detects", that a particle is dead (e.g. it's age is 5 seconds or so) the index for that particle needs to be remembered so new particles can reuse that particle index. This can be done in many different ways, two obvious ways are stacks or heaps.
A special data structure to store those dead particle indices is only necessary, if the max ages of particles differ. If they don't differ, you can just implement a ring buffer. But most of the time, you will use this particle engine for all kinds of particles, and those might have variable time to live values. Then you need those data structures.
The algorithm uses the fragment shader to do velocity calculations. It reads data from one texture (that contains x/y/z coordinates instead of r/g/b color information) and writes to a different texture (that also contains x/y/z coordinates instead of r/g/b color information), uses a 1:1 mapping between source and target texture and renders the whole source texture to the target texture. This has nothing to do with the actual particles that will be rendered later in step 6. Render Particles
.
Or in other words: "screen-sized quad" is actually a wrong term here, it should read "texture-sized quad" because at this point, nothing is drawn to the screen at all. The target texture (i.e. the texture, that will hold the new position information) IS the screen.
/edit just again:
OK, maybe rephrase the document:
You have a struct
:
struct color {
float r, g, b;
};
and a few #define
s:
#define vector color
#define x r
#define y g
#define z b
And you have a few arrays for your particles:
#define NP 1024 * 1024
struct vector particle_pos[2][NP];
struct vector particle_vel[2][NP];
uint32_t particle_birth_tick[NP];
// Double buffering - gonne have to remember, where
// we read from and where we write to:
struct vector * particle_pos_r = particle_pos[0];
struct vector * particle_pos_w = particle_pos[1];
struct vector * particle_vel_r = particle_vel[0];
struct vector * particle_vel_w = particle_vel[1];
Now:
- Process Birth and Death
#define TTL 5 * 25 // 5 seconds * 25 simulation steps per second.
for (size_t i = 0; i < NP; ++i) {
if (particle_birth_tick[i] + TTL == current_tick) {
particle_pos_r[i].x = somewhere behind viewer;
particle_pos_r[i].y = somewhere behind viewer;
particle_pos_r[i].z = somewhere behind viewer;
particle_vel_r[i].x = 0;
particle_vel_r[i].y = 0;
particle_vel_r[i].z = 0;
free_list.add(i);
}
}
void add_particle(struct vector p, struct vector v) {
size_t i = free_list.pop_any();
particle_pos_r[i] = p;
particle_vel_r[i] = v;
}
- Update Velocities
for (size_t i = 0; i < 1024 * 1024; ++i) {
particle_vel_w[i].x = do_calculations(particle_vel_r[i].x)
particle_vel_w[i].y = do_calculations(particle_vel_r[i].y)
particle_vel_w[i].z = do_calculations(particle_vel_r[i].z)
}
swap(particle_vel_r, particle_vel_w);
- Update Positions
for (size_t i = 0; i < 1024 * 1024; ++i) {
particle_pos_w[i].x = particle_pos_r[i].x + particle_vel_r[i].x;
particle_pos_w[i].y = particle_pos_r[i].y + particle_vel_r[i].y;
particle_pos_w[i].z = particle_pos_r[i].z + particle_vel_r[i].z;
}
swap(particle_pos_r, particle_pos_w);
- Sort for Alpha Blending
sort a bit...
- Transfer Texture Data to Vertex Data
copy the pos texture into a vbo
- Render Particles
actually draw particles
The interesting point here is, that steps 2-5 all happen exclusively on the GPU (Step 1 happens on both, GPU and CPU). And hence the term "rendering". Because that loops in 2 and 3 just "render" the "texture" particle_vel_r
and/or particle_pos_r
into the the "frame buffer" particle_vel_w
or particle_pos_w
completely filling the frame buffer "screen-sized quad" with the source texture.
Upvotes: 2