Shared memory n-body simulation in Chapel

Question

I'm trying to re-implement the shared memory implementation of the n-body simulation presented in chapter 6.1.6 in An Introduction to Parallel Programming by Peter Pacheco. In that chapter it was implemented using OpenMP.

Here is my parallel implementation using OpenMP. And here is a serial implementation using Chapel. I'm having issues implementing a shared memory parallel implementation using Chapel. Since there is no way to get the rank of a thread in a forall loop, I can't use the same approach as in the OpenMP implementation. I would have to use the coforall loop, create the tasks and distribute the iterations manually. That does not seem practical and suggests that there is a more elegant way to solve this within Chapel.

I'm looking for guidance and suggestions on how to better solve this problem using the tools provided by Chapel.

Brad · Accepted Answer

My suggestion would be to use a (+) reduce intent on forces in your forall-loop, which will give each task its own private copy of forces and then (sum) reduce their individual copies back into the original forces variable as the tasks complete. This would be done by attaching the following with-clause to your forall loop:

  forall q in 0..#n_bodies with (+ reduce forces) {

While here, I looked for other ways to make the code a bit more elegant and would suggest changing from a 2D array to an array-of-arrays for this problem in order to collapse a bunch of trios of similar code statements for x, y, z components down to a single statement. I also made use of your pDomain variable and created a type alias for [0..#3] real in order to remove some redundancy in the code. Oh, and I removed the uses of the Math and IO modules because they are auto-used in Chapel programs.

Here's where that left me:

config const filename = "input.txt";
config const iterations = 100;
config const out_filename = "out.txt";
const X = 0;
const Y = 1;
const Z = 2;
const G = 6.67e-11;
config const dt = 0.1;

// Read input file, initialize bodies                                       
var f = open(filename, iomode.r);
var reader = f.reader();

var n_bodies = reader.read(int);
const pDomain = {0..#n_bodies};

type vec3 = [0..#3] real;

var forces: [pDomain] vec3;
var velocities: [pDomain] vec3;
var positions: [pDomain] vec3;
var masses: [pDomain] real;

for i in pDomain {
  positions[i] = reader.read(vec3);

  velocities[i] = reader.read(vec3);

  masses[i] = reader.read(real);
}

f.close();
reader.close();

for i in 0..#iterations {
  // Reset forces                                                           
  forces = [0.0, 0.0, 0.0];

  forall q in pDomain with (+ reduce forces) {
    for k in pDomain {
      if k <= q {
        continue;
      }
      var diff = positions[q] - positions[k];
      var dist = sqrt(diff[X]**2 + diff[Y]**2 + diff[Z]**2);
      var dist_cubed = dist**3;

      var tmp = -G * masses[q] * masses[k] / dist_cubed;
      var force_qk = tmp * diff;

      forces[q] += force_qk;
      forces[k] -= force_qk;
    }
  }


  forall q in pDomain {
    positions[q] += dt * velocities[q];
    velocities[q] += dt / masses[q] * forces[q];
  }
}

var outf = open(out_filename, iomode.cw);
var writer = outf.writer();

for q in pDomain {
  writer.writeln("%er %er %er %er %er %er".format(positions[q][X], positions[q][Y], positions[q][Z], velocities[q][X], velocities[q][Y], velocities[q][Z]));
}

writer.close();
outf.close();

One other change you could consider making would be to replace the forall-loop that updates positions and velocities with the following whole-array statements:

    positions += dt * velocities;                                           
    velocities += dt / masses * forces;

where the main tradeoff would be that the forall would implement the statements in a fused manner using a single parallel loop while the whole-array statements would not (at least in the current version 1.18 version of the compiler).

Shared memory n-body simulation in Chapel

Answers (1)

Related Questions