Ervadac
Ervadac

Reputation: 956

Node.js domain cluster worker disconnect

Looking at the example given at the nodejs domain doc page: http://nodejs.org/api/domain.html, the recommended way to restart a worker using cluster is to call first disconnect in the worker part, and listen to the disconnect event in the master part. However, if you just copy/paste the example given, you will notice that the disconnect() call does not shutdown the current worker:

What happens here is:

try {
    var killtimer = setTimeout(function() {
        process.exit(1);
    }, 30000);
    killtimer.unref();
    server.close();
    cluster.worker.disconnect();
    res.statusCode = 500;
    res.setHeader('content-type', 'text/plain');
    res.end('Oops, there was a problem!\n');
} catch (er2) {
    console.error('Error sending 500!', er2.stack);
}
  1. I do a get request at /error

    • A timer is started: in 30s the process will be killed if not already
    • The http server is shut down
    • The worker is disconnected (but still alive)
    • The 500 page is displayed
  2. I do a second get request at error (before 30s)

    • New timer started
    • Server is already closed => throw an error
    • The error is catched in the "catch" block and no result is sent back to the client, so on the client side, the page is waiting without any message.

In my opinion, it would be better to just kill the worker, and listen to the 'exit' event on the master part to fork again. This way, the 500 error is always sent during an error:

try {
    var killtimer = setTimeout(function() {
        process.exit(1);
    }, 30000);
    killtimer.unref();
    server.close();
    res.statusCode = 500;
    res.setHeader('content-type', 'text/plain');
    res.end('Oops, there was a problem!\n');
    cluster.worker.kill();
} catch (er2) {
    console.error('Error sending 500!', er2);
}

I'm not sure about the down side effects using kill instead of disconnect, but it seems disconnect is waiting the server to close, however it seems this is not working (at least not like it should)

I just would like some feedbacks about this. There could be a good reason this example is written this way that I've missed.

Thanks

EDIT:

I've just checked with curl, and it works well.
However I was previously testing with Chrome, and it seems that after sending back the 500 response, chrome does a second request BEFORE the server actually ends to close.
In this case, the server is closing and not closed (which means the worker is also disconnecting without being disconnected), causing the second request to be handled by the same worker as before so:

  1. It prevents the server to finish to close
  2. The second server.close(); line being evaluated, it triggers an exception because the server is not closed.
  3. All following requests will trigger the same exception until the killtimer callback is called.

Upvotes: 3

Views: 1912

Answers (2)

kristok
kristok

Reputation: 152

I ran into the same problem around 6 months ago, sadly don't have any code to demonstrate as it was from my previous job. I solved it by explicitly sending a message to the worker and calling disconnect at the same time. Disconnect prevents the worker from taking on new work and in my case as i was tracking all work that the worker was doing (it was for an upload service that had long running uploads) i was able to wait until all of them are finished and then exit with 0.

Upvotes: 0

Ervadac
Ervadac

Reputation: 956

I figured it out, actually when the server is closing and receives a request at the same time, it stops its closing process.
So he still accepts connection, but cannot be closed anymore.

Even without cluster, this simple example illustrates this:

var PORT = 8080;
var domain = require('domain');
var server = require('http').createServer(function(req, res) {
    var d = domain.create();
    d.on('error', function(er) {
            try {
                var killtimer = setTimeout(function() {
                    process.exit(1);
                }, 30000);
                killtimer.unref();
                console.log('Trying to close the server');
                server.close(function() {
                    console.log('server is closed!');
                });
                console.log('The server should not now accepts new requests, it should be in "closing state"');
                res.statusCode = 500;
                res.setHeader('content-type', 'text/plain');
                res.end('Oops, there was a problem!\n');
            } catch (er2) {
                console.error('Error sending 500!', er2);
            }
        });

        d.add(req);
        d.add(res);

        d.run(function() {
            console.log('New request at: %s', req.url);
            // error
            setTimeout(function() {
                flerb.bark();
            });
        });
});
server.listen(PORT);

Just run:

curl http://127.0.0.1:8080/ http://127.0.0.1:8080/ 

Output:

New request at: /
Trying to close the server
The server should not now accepts new requests, it should be in "closing state"
New request at: /
Trying to close the server
Error sending 500! [Error: Not running]

Now single request:

curl http://127.0.0.1:8080/

Output:

New request at: /
Trying to close the server
The server should not now accepts new requests, it should be in "closing state"
server is closed!

So with chrome doing 1 more request for the favicon for example, the server is not able to shutdown.

For now I'll keep using worker.kill() which makes the worker not to wait for the server to stops.

Upvotes: 3

Related Questions