hugo
hugo

Reputation: 1245

Mongo Crashes Periodically

We have a 3-node replicaSet that periodically crashes and is unable to recover. Looking through our PRIMARY server's mongod.log file, I see multiple errors. I'm not sure where to begin or even what to include in this post but I'll start with the errors I am receiving. If I'm missing something, please let me know and I'll edit the post and include it. Can anyone shed any light on why this is happening?

Thu Feb 27 14:09:47.790 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet syncing to: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.791 [rsBackgroundSync] repl: couldn't connect to server mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.792 [conn152] end connection xx.xxx.xxx.107:43904 (71 connections now open)
Thu Feb 27 14:09:48.077 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:09:48.077 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet info mongos2i.hostname.com:27017 is down (or slow to respond):
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state DOWN
Thu Feb 27 14:09:48.080 [rsMgr] not electing self, mongos1i.hostname.com:27017 would veto with 'mongom1i.hostname.com:27017 is trying to elect itself but mongos2i.hostname.com:27017 is already primary and more up-to-date'
Thu Feb 27 14:09:49.079 [conn153] replSet info voting yea for mongos1i.hostname.com:27017 (1)
Thu Feb 27 14:09:50.080 [rsHealthPoll] replSet member mongos1i.hostname.com:27017 is now in state PRIMARY
Thu Feb 27 14:09:50.081 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is up
Thu Feb 27 14:09:50.082 [initandlisten] connection accepted from xx.xxx.xxx.107:43907 #154 (72 connections now open)
Thu Feb 27 14:09:50.082 [conn154] end connection xx.xxx.xxx.107:43907 (71 connections now open)
Thu Feb 27 14:09:50.086 [initandlisten] connection accepted from xx.xxx.xxx.107:43909 #155 (72 connections now open)
Thu Feb 27 14:09:50.792 [rsBackgroundSync] replSet syncing to: mongos1i.hostname.com:27017
Thu Feb 27 14:09:52.082 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:10:04.090 [conn155] end connection xx.xxx.xxx.107:43909 (71 connections now open)
Thu Feb 27 14:10:04.091 [initandlisten] connection accepted from xx.xxx.xxx.107:43913 #156 (72 connections now open)
Thu Feb 27 14:10:10.731 [conn153] end connection xx.xxx.xxx.97:52297 (71 connections now open)
Thu Feb 27 14:10:10.732 [initandlisten] connection accepted from xx.xxx.xxx.97:52302 #157 (72 connections now open)
Thu Feb 27 14:10:29.706 [initandlisten] connection accepted from 127.0.0.1:56436 #158 (73 connections now open)
Thu Feb 27 14:10:34.100 [conn156] end connection xx.xxx.xxx.107:43913 (72 connections now open)
Thu Feb 27 14:10:34.101 [initandlisten] connection accepted from xx.xxx.xxx.107:43916 #159 (73 connections now open)
Thu Feb 27 14:10:40.743 [conn157] end connection xx.xxx.xxx.97:52302 (72 connections now open)
Thu Feb 27 14:10:40.744 [initandlisten] connection accepted from xx.xxx.xxx.97:52309 #160 (73 connections now open)
Thu Feb 27 14:11:04.110 [conn159] end connection xx.xxx.xxx.107:43916 (72 connections now open)
Thu Feb 27 14:11:04.111 [initandlisten] connection accepted from xx.xxx.xxx.107:43918 #161 (73 connections now open)
Thu Feb 27 14:11:09.191 [conn161] end connection xx.xxx.xxx.107:43918 (72 connections now open)
Thu Feb 27 14:11:09.452 [initandlisten] connection accepted from xx.xxx.xxx.107:43919 #162 (73 connections now open)
Thu Feb 27 14:11:09.453 [conn162] end connection xx.xxx.xxx.107:43919 (72 connections now open)
Thu Feb 27 14:11:09.456 [initandlisten] connection accepted from xx.xxx.xxx.107:43921 #163 (73 connections now open)
Thu Feb 27 14:11:10.111 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:11:10.111 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:11:10.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state STARTUP2
Thu Feb 27 14:11:10.755 [conn160] end connection xx.xxx.xxx.97:52309 (72 connections now open)
Thu Feb 27 14:11:10.757 [initandlisten] connection accepted from xx.xxx.xxx.97:52311 #164 (73 connections now open)
Thu Feb 27 14:11:12.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:11:23.462 [conn163] end connection xx.xxx.xxx.107:43921 (72 connections now open)
Thu Feb 27 14:11:23.463 [initandlisten] connection accepted from xx.xxx.xxx.107:43925 #165 (73 connections now open)
Thu Feb 27 14:11:31.831 [conn158] end connection 127.0.0.1:56436 (72 connections now open)
Thu Feb 27 14:11:40.768 [conn164] end connection xx.xxx.xxx.97:52311 (71 connections now open)
Thu Feb 27 14:11:40.769 [initandlisten] connection accepted from xx.xxx.xxx.97:52315 #166 (72 connections now open)
Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:

We are using CentOS and Mongo 2.4.9.

Thanks in advance for the help.

Upvotes: 2

Views: 1250

Answers (1)

daveh
daveh

Reputation: 3696

The log output you have posted shows that your MongoDB instance did not crash. It exited normally. Consider the following lines:

Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:

The first line above indicates that your MongoDB instancce recieved signal 15 from your OS (SIGTERM). This lead to MongoDB terminating. SIGTERM is the default level for the kill command and for stop portion of an init script in most Linux distros.

Upvotes: 5

Related Questions