Reputation: 753
I'm running RethinkDB in Docker. Everything was working REALLY well for a while until we moved to a new datacenter (but im not sure if this is related to the move at all). Here's what is going on.
I start the rethinkdb container and everything runs well for a while. After some amount of time (it varies between an hour or more) I see the following in the Docker logs (highlighted in yellow):
I have absolutely no idea why it receives a SIGTERM from the system randomly. Any ideas would be greatly appreciated!
EDIT: I am adding the snippet of the logfile for the SIGTERMs. There doesn't seem to be any kind of pattern according to the timestamps.
2015-07-15T16:15:02.888762613 663165.661585s notice: Server got SIGTERM from pid 0, uid 0; shutting down...
2015-07-17T17:02:11.562306701 13322.914561s notice: Server got SIGTERM from pid 0, uid 0; shutting down...
2015-07-19T18:31:12.499022237 96786.220054s notice: Server got SIGTERM from pid 0, uid 0; shutting down...
2015-07-20T13:52:44.493304030 69690.608865s notice: Server got SIGTERM from pid 0, uid 0; shutting down...
EDIT 2: I ran RethinkDB outside of Docker and I see this in the logs: error: worker process failed to resynchronize with main process. Not sure if it's anything to be concerned about. It does not appear to affect the RethinkDB instance at all (all clients remained connected).
2015-07-21T06:53:10.663375859 0.116098s info: Automatically using cache size of 10702 MB
2015-07-21T06:53:10.676277261 0.128998s notice: Listening for intracluster connections on port 29015
2015-07-21T06:53:10.684504354 0.137225s notice: Listening for client driver connections on port 28015
2015-07-21T06:53:10.685485550 0.138206s notice: Listening for administrative HTTP connections on port 8080
2015-07-21T06:53:10.686313405 0.139034s notice: Listening on addresses: 127.0.0.1, 172.17.42.1, 192.151.151.122, ::1, fe80::1879:43ff:fe5e:bdb2%34, fe80::62eb:69ff:fe07:d986%2, fe80::b837:f2ff:fecd:d5cd%4
2015-07-21T06:53:10.686316632 0.139037s notice: Server ready, "0aa312e817ef_nrx" 069ac5b3-9f43-4bbe-9022-c1f006790e99
2015-07-21T06:53:11.558116243 1.010837s error: worker process failed to resynchronize with main process
2015-07-21T06:53:11.558122179 1.010843s notice: A newer version of the RethinkDB server is available: 2.0.4. You can read the changelog at <https://github.com/rethinkdb/rethinkdb/releases>.
EDIT 3: I found another issue here that I think may be the real problem. The rethink adapter (in the app) is keeping the connections to the DB server established which is exhausting the available file descriptors/ports in the system. Here is an example printout of an lsof
. NOTE this is only a short list. There are hundreds/thousands of these kept open when multiple people use the system
node 11633 [username_ommitted] 201u IPv4 0x53153575d33d64bb 0t0 TCP 192.168.1.142:61041->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 202u IPv4 0x53153575d33fa65b 0t0 TCP 192.168.1.142:61053->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 203u IPv4 0x53153575dd6a5d8b 0t0 TCP 192.168.1.142:61043->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 204u IPv4 0x53153575bff6717b 0t0 TCP 192.168.1.142:61044->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 206u IPv4 0x53153575d33e54bb 0t0 TCP 192.168.1.142:61049->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 207u IPv4 0x53153575d33ef4bb 0t0 TCP 192.168.1.142:61050->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 208u IPv4 0x53153575d33f2a4b 0t0 TCP 192.168.1.142:61051->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 209u IPv4 0x53153575c333a17b 0t0 TCP 192.168.1.142:61054->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 210u IPv4 0x53153575d33b47fb 0t0 TCP 192.168.1.142:61056->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 211u IPv4 0x53153575d33de17b 0t0 TCP 192.168.1.142:61057->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 212u IPv4 0x53153575d33f065b 0t0 TCP 192.168.1.142:61058->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 213u IPv4 0x53153575bff67a4b 0t0 TCP 192.168.1.142:61059->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 216u IPv4 0x53153575dd68f31b 0t0 TCP 192.168.1.142:61062->[RETHINK_IP]:28015 (ESTABLISHED)
node 11633 [username_ommitted] 217u IPv4 0x53153575dd675a4b 0t0 TCP 192.168.1.142:61063->[RETHINK_IP]:28015 (ESTABLISHED)
Upvotes: 3
Views: 592
Reputation: 138
I had the exact same issue and believe it to be caused by not closing database connections (as you note in your EDIT 3
). I'm using RethinkDB in a expressjs app and followed there middleware example from here but I always terminate the request-response cycle within my controllers without calling next()
, meaning the closeConnection middleware was never being reached.
Upvotes: 0