How to use parallel child processes to perform "work" on a large array?

I have a huge array of numbers. I want to compute a sum of all of the numbers using JavaScript / Node.js. (For the purposes of this question it's a simple sum; in reality I have a much more complex and lengthy mathematical operation to perform).

In a single-threaded world, computing the sum takes a long time. To crunch the result quicker, I've been trying to delegate the work to multiple child processes running in parallel. Each child process determines the sum of a sub-array, and everything is totalled in the parent process.

My two scripts are below:

index.js

function computeSum(data) {
  var start = new Date();
  var sum = data.reduce(function(a, b) { return a + b; });
  console.log("Sum = %s, Time = %s ms", sum, new Date().getTime() - start.getTime());
}

function computeSumConcurrent(data) {
  var child_process = require("child_process");
  var os = require("os");
  var cpuCount = os.cpus().length;
  var subArraySize = data.length / cpuCount;
  var childProcessesFinished = 0;
  var start = new Date();
  var sum = 0;

  for (var i = 0; i < cpuCount; i++) {
    var childProcess = child_process.fork("child.js");

    childProcess.on("message", function(message) {
      sum += message.sum;
      childProcessesFinished++;

      if (childProcessesFinished == cpuCount) {
        console.log("Sum = %s, Time = %s ms", sum, new Date().getTime() - start.getTime());
        process.exit();
      }
    });

    childProcess.send({ subArray: data.slice(subArraySize * i, subArraySize * (i + 1)) });
  }
}

console.log("Populating array...");
var data = []
for (var i = 0; i < 50000000; i++) {
  data.push(Math.random());
}

console.log("Computing sum without using child processes...");
computeSum(data);

console.log("Computing sum using child processes...");
computeSumConcurrent(data);

child.js

process.on("message", function(message) {
  var sum = message.subArray.reduce(function(a, b) { return a + b; });
  process.send({ sum: sum });
  process.exit();
});

If you run index.js, you'll observe that the parallel sum is extremely slow. I think it may be due to childProcess.send, which isn't meant to communicate large chunks of data, but I'm not entirely sure.

So what's the solution for this sort of thing? How can I make the parallel sum quicker than the single-threaded one?

Upvotes: 1

Views: 1348

Answers (1)

Naeem Shaikh
Naeem Shaikh

Reputation: 15715

Creating child processes for small work and sending and receiving messages can in fact increase the duration or processing as there is time required for sending and receiving messages.

There is another problem as well, in your code, that you are actually dividing the work for children from the main process itself, this is not going to make your main process free from work, but only increase it more.

I would suggest you another approach.

  1. create a child process send it all the data.

  2. let the child itself create other children and assign work to them. and then calculate all the results.

  3. Send the final result to the parent.

Note:

  1. make sure you don't create too many children and assign very small work to them. this will only make you send and receive too many messages, and thus will delay the processing.

  2. Also don't create too less child forks that they themselves require too much of time processing the given task.

  3. you will have to make sure any child process doesnt exit before the children it has sreated itself

Benefits:

  1. your main process will not be busy in dividing tasks and calculating the result.
  2. A fine number of children forked will make your task complete in relatively short duration (I don't say very short duration)

Feel free to ask if you need some example.

Upvotes: 2

Related Questions