papercowboy
papercowboy

Reputation: 3459

How to use Node.js cluster with ZeroMQ reply workers

The basic question is "how to setup a ZeroMQ REPLY socket as a worker?".

In effect, how to swap out the default example using an HTTP server with a ZeroMQ REPLY server, eg.:

var cluster = require('cluster'),
    zmq = require('zmq');

if (cluster.isMaster) {
  cluster.fork();
  cluster.fork();
}
else {
  // Using `http.createServer(..).listen(5555);` works perfectly

  // However, the following does not:
  var socket = zmq.socket('rep');
  socket.bind( "tcp://*:5555", function (err) { console.log(process.pid, err); } );
}

The first worker logs 'undefined' (ie. no error), whereas the second worker logs an error: Address already in use.

From the "How it works" docs, the bit that seems appropriate is here (emphasis added):

When you call server.listen(...) in a worker, it serializes the arguments and passes the request to the master process. If the master process already has a listening server matching the worker's requirements, then it passes the handle to the worker. If it does not already have a listening server matching that requirement, then it will create one, and pass the handle to the worker.

How to "match requirements" with a ZeroMQ REPLY socket?

Upvotes: 2

Views: 1662

Answers (1)

Jason
Jason

Reputation: 13766

You seem to be heading down a slightly convoluted path. Here's the first things you should note:

  • the http and cluster modules of node.js are built in, which means they can work together in certain ways that aren't available to external/3rd party modules.
  • the zmq bindings module is a 3rd party module, so it will not enjoy the same efficiencies and abstractions that you're getting by combining http and cluster
  • ZMQ is not a replacement for HTTP, in any context (just in case you thought it might be, it's not clear from your question)

... the upshot is that you can't bind() on the same address/port and have node "serialize the argument" or "pass the request to the master process", and the master process can't "create a (bound) server matching the worker's requirements and pass the handle to the worker"... this is not how the ZMQ binding works, and it would require essentially emulating cluster from within zmq to get things to work this way.

You can't use ZeroMQ in this way in node.

But there's good news!

The reason we do this sort of process with an HTTP server in node is because it offers more bandwidth to serve a higher number of requests. Typically you will receive an HTTP request, and you will process and handle that request in some way. There are minor delays as you're performing this process that, in a high traffic scenario, can prevent your server from receiving new requests. Enter cluster. By spinning up multiple listeners in separate workers, as one request is being handled, the next worker is immediately available to handle the next request... and it takes just that much more traffic to actually bog things down.

You don't need to worry about that with ZMQ:

  • Your single ZMQ socket should be able to handle massive throughput, hundreds of thousands up to even millions of messages a second.
  • The "Q" part of ZMQ is what allows this - messages are received into an internal queue, and you pull them off of that queue to process them, so you are never(*) blocked from receiving new messages when you're handling the other ones.
  • If you absolutely need to perform some parallelization in order to increase bandwidth (it's not impossible, but you should not prematurely optimize for this... don't solve the problem unless you know you're in the narrow population that will experience it), then you can bind on multiple addresses/ports, and use the built in round-robining of ZMQs more advanced sockets (DEALER, etc) or use more advanced message patterns to achieve your parallelizing effect.

(*) - you can overload your system and drop new messages, depending on your memory and processing constraints, and the size of your messages. This is the purpose of the High Water Mark. Running into this limitation is realistic, but dealing with it is specific to your situation and not within the scope of your current question.


So, what's the answer?

The answer is that you typically will not need to use cluster in your node zmq app to achieve server parallelization. You could approximate this sort of structure if you really, really wanted to... but you'd typically be better off writing your app in your master, and if you find yourself in need of performing processor intensive tasks in parallel, spinning up a worker to handle that specific task that returns/exits when it is done or, better yet, writing your processor intensive code in a more efficient language like C++ and writing an asynchronous module that takes advantage of node's inherent strengths and offloads its weaknesses.

Upvotes: 7

Related Questions