Node.js domain cluster worker disconnect

Question

Looking at the example given at the nodejs domain doc page: http://nodejs.org/api/domain.html, the recommended way to restart a worker using cluster is to call first disconnect in the worker part, and listen to the disconnect event in the master part. However, if you just copy/paste the example given, you will notice that the disconnect() call does not shutdown the current worker:

What happens here is:

try {
    var killtimer = setTimeout(function() {
        process.exit(1);
    }, 30000);
    killtimer.unref();
    server.close();
    cluster.worker.disconnect();
    res.statusCode = 500;
    res.setHeader('content-type', 'text/plain');
    res.end('Oops, there was a problem!
');
} catch (er2) {
    console.error('Error sending 500!', er2.stack);
}

I do a get request at /error
- A timer is started: in 30s the process will be killed if not already
- The http server is shut down
- The worker is disconnected (but still alive)
- The 500 page is displayed
I do a second get request at error (before 30s)
- New timer started
- Server is already closed => throw an error
- The error is catched in the "catch" block and no result is sent back to the client, so on the client side, the page is waiting without any message.

In my opinion, it would be better to just kill the worker, and listen to the 'exit' event on the master part to fork again. This way, the 500 error is always sent during an error:

try {
    var killtimer = setTimeout(function() {
        process.exit(1);
    }, 30000);
    killtimer.unref();
    server.close();
    res.statusCode = 500;
    res.setHeader('content-type', 'text/plain');
    res.end('Oops, there was a problem!
');
    cluster.worker.kill();
} catch (er2) {
    console.error('Error sending 500!', er2);
}

I'm not sure about the down side effects using kill instead of disconnect, but it seems disconnect is waiting the server to close, however it seems this is not working (at least not like it should)

I just would like some feedbacks about this. There could be a good reason this example is written this way that I've missed.

Thanks

EDIT:

I've just checked with curl, and it works well.
However I was previously testing with Chrome, and it seems that after sending back the 500 response, chrome does a second request BEFORE the server actually ends to close.
In this case, the server is closing and not closed (which means the worker is also disconnecting without being disconnected), causing the second request to be handled by the same worker as before so:

It prevents the server to finish to close
The second server.close(); line being evaluated, it triggers an exception because the server is not closed.
All following requests will trigger the same exception until the killtimer callback is called.

Ervadac · Accepted Answer

I figured it out, actually when the server is closing and receives a request at the same time, it stops its closing process.
So he still accepts connection, but cannot be closed anymore.

Even without cluster, this simple example illustrates this:

var PORT = 8080;
var domain = require('domain');
var server = require('http').createServer(function(req, res) {
    var d = domain.create();
    d.on('error', function(er) {
            try {
                var killtimer = setTimeout(function() {
                    process.exit(1);
                }, 30000);
                killtimer.unref();
                console.log('Trying to close the server');
                server.close(function() {
                    console.log('server is closed!');
                });
                console.log('The server should not now accepts new requests, it should be in "closing state"');
                res.statusCode = 500;
                res.setHeader('content-type', 'text/plain');
                res.end('Oops, there was a problem!
');
            } catch (er2) {
                console.error('Error sending 500!', er2);
            }
        });

        d.add(req);
        d.add(res);

        d.run(function() {
            console.log('New request at: %s', req.url);
            // error
            setTimeout(function() {
                flerb.bark();
            });
        });
});
server.listen(PORT);

Just run:

curl http://127.0.0.1:8080/ http://127.0.0.1:8080/

Output:

New request at: /
Trying to close the server
The server should not now accepts new requests, it should be in "closing state"
New request at: /
Trying to close the server
Error sending 500! [Error: Not running]

Now single request:

curl http://127.0.0.1:8080/

Output:

New request at: /
Trying to close the server
The server should not now accepts new requests, it should be in "closing state"
server is closed!

So with chrome doing 1 more request for the favicon for example, the server is not able to shutdown.

For now I'll keep using worker.kill() which makes the worker not to wait for the server to stops.

Node.js domain cluster worker disconnect

Answers (2)

Related Questions