Reputation: 913
I'm trying to re-implement the shared memory implementation of the n-body simulation presented in chapter 6.1.6 in An Introduction to Parallel Programming by Peter Pacheco. In that chapter it was implemented using OpenMP.
Here is my parallel implementation using OpenMP. And here is a serial implementation using Chapel. I'm having issues implementing a shared memory parallel implementation using Chapel. Since there is no way to get the rank of a thread in a forall
loop, I can't use the same approach as in the OpenMP implementation. I would have to use the coforall
loop, create the tasks and distribute the iterations manually. That does not seem practical and suggests that there is a more elegant way to solve this within Chapel.
I'm looking for guidance and suggestions on how to better solve this problem using the tools provided by Chapel.
Upvotes: 3
Views: 247
Reputation: 4008
My suggestion would be to use a (+) reduce intent on forces
in your forall-loop, which will give each task its own private copy of forces
and then (sum) reduce their individual copies back into the original forces
variable as the tasks complete. This would be done by attaching the following with-clause to your forall loop:
forall q in 0..#n_bodies with (+ reduce forces) {
While here, I looked for other ways to make the code a bit more elegant and would suggest changing from a 2D array to an array-of-arrays for this problem in order to collapse a bunch of trios of similar code statements for x, y, z components down to a single statement. I also made use of your pDomain
variable and created a type alias for [0..#3] real
in order to remove some redundancy in the code. Oh, and I removed the use
s of the Math
and IO
modules because they are auto-used in Chapel programs.
Here's where that left me:
config const filename = "input.txt";
config const iterations = 100;
config const out_filename = "out.txt";
const X = 0;
const Y = 1;
const Z = 2;
const G = 6.67e-11;
config const dt = 0.1;
// Read input file, initialize bodies
var f = open(filename, iomode.r);
var reader = f.reader();
var n_bodies = reader.read(int);
const pDomain = {0..#n_bodies};
type vec3 = [0..#3] real;
var forces: [pDomain] vec3;
var velocities: [pDomain] vec3;
var positions: [pDomain] vec3;
var masses: [pDomain] real;
for i in pDomain {
positions[i] = reader.read(vec3);
velocities[i] = reader.read(vec3);
masses[i] = reader.read(real);
}
f.close();
reader.close();
for i in 0..#iterations {
// Reset forces
forces = [0.0, 0.0, 0.0];
forall q in pDomain with (+ reduce forces) {
for k in pDomain {
if k <= q {
continue;
}
var diff = positions[q] - positions[k];
var dist = sqrt(diff[X]**2 + diff[Y]**2 + diff[Z]**2);
var dist_cubed = dist**3;
var tmp = -G * masses[q] * masses[k] / dist_cubed;
var force_qk = tmp * diff;
forces[q] += force_qk;
forces[k] -= force_qk;
}
}
forall q in pDomain {
positions[q] += dt * velocities[q];
velocities[q] += dt / masses[q] * forces[q];
}
}
var outf = open(out_filename, iomode.cw);
var writer = outf.writer();
for q in pDomain {
writer.writeln("%er %er %er %er %er %er".format(positions[q][X], positions[q][Y], positions[q][Z], velocities[q][X], velocities[q][Y], velocities[q][Z]));
}
writer.close();
outf.close();
One other change you could consider making would be to replace the forall-loop that updates positions and velocities with the following whole-array statements:
positions += dt * velocities;
velocities += dt / masses * forces;
where the main tradeoff would be that the forall would implement the statements in a fused manner using a single parallel loop while the whole-array statements would not (at least in the current version 1.18 version of the compiler).
Upvotes: 4