Thomas
Thomas

Reputation: 4719

Simulating threads in Node.js

I have a Node.js application that basically caches data from a web service. I have also a queue which receives approximately 500 items that need to be processed as quickly as possible. By processed, I mean that each one of them represents one HTTP request to be made and its response to be cached.

Now, the single-threaded architecture of Node is not ideal for this scenario. Ideally, I would like to spawn 5-10 "threads" to process the queue as quickly as possible. I read there is a child_process module that can fork processes, but I have never used it. Could this module help?

Can anyone suggest a solution for this problem?

Upvotes: 0

Views: 896

Answers (2)

GottZ
GottZ

Reputation: 4947

child_processes are simply forks of a new node process running the same or a different script. you can use that api to spawn system processes aswell but thats not what i will describe here.

they behave like true nodejs processes because thats what they are.

there is a big big negative side:

you need to keep in mind that spawning a node process takes alot of time and ressources so usualy its faster to compute data within one node process OR to spawn worker childs to communicate work to. as you can see in the documentation you are able to send and recceive data from and to the child_process wich makes you be able to delegate work to already spawned childs.

child processes usually share the same stdin and stdout as the process that spawned it unless you change it. just take a look at the documentation. its very well documented and easy to work with.

child_process documentation

i've never made worker childs but i've made stuff like this wich you may consider usefull.

if (process.argv.indexOf("child") == -1) {
  process.chdir(module.filename.replace(/\/[^\/]+$/, ""));
  var child;
  var spawn = function () {
    console.log("spawning child process " + new Date());
    child = require("child_process").fork(module.filename, ["child"]);
    child.on("close", function () {
      spawn();
    });
  }
  spawn();

  process.on("exit", function () {
    child.kill();
  });
  return;
}

// child code begins here

var fs = require("fs");

fs.watch(process.argv[1], function () {
  process.exit();
});

Upvotes: 2

Sal Rahman
Sal Rahman

Reputation: 4748

The child_process module will somewhat do what you want.

Only issue is, you literally spawn new processes, so hence, there is a memory overhead that you have to consider. Assuming you want the elegance of defining your subroutines within the same file, you can pass a JavaScript string to the node command.

So this is exactly what we will do. But first, let's create a function that accepts a JSON-compatible object, and a function, which will then run that function on a new thread:

var child_process = require('child_process');

function startThread(data, fn, callback) {
  var fnStr = '(' + fn.toString() + ')(' + JSON.stringify(data) + ');';

  var node = child_process.spawn('node', ['-e', fnStr]);

  var output = [];

  var onData = function (data) {
    output.push(data.toString('utf8').trim());
  };

  node.stdout.on('data', onData);
  node.stderr.on('data', onData);

  node.on('close', function (code) {
    callback(code, output);
  });
}

And as an example, we are going to be spawning a new thread to generate the lyrics of the "99 bottles of beer" song:

startThread({ doFor: '99' }, function (data) {
  var str = '';
  while (data.doFor) {
    str += data.doFor + ' bottles of beer on the wall ' + data.doFor +
    ' bottles of beer. You take one out, toss it around, ';
    data.doFor--;
    str += data.doFor + ' bottles of beer on the wall\n';
  }
  console.log(str.trim());
}, function (code, outputs) {
  console.log(outputs.join(''));
});

Unfortunately, the function that will be used in the other "thread" wouldn't have access to variables in the parent thread.

And also, data is passed through STDOUT and STDERR.

Upvotes: 0

Related Questions