Nuno Sousa
Nuno Sousa

Reputation: 882

Mongodb "NetworkInterfaceExceededTimeLimit" replicaset

Solved:

It most have been a version specific bug. Updating to [email protected] solved the issue and the machines connect fine. I'll leave the question for eventualy someone who faces the same issue.

Original question:

I have two servers in a two member replica set.

mx.aireclaim.com is the PRIMARY and vimax.aireclaim.com is the SECONDARY. After a reboot on mx.aireclaim.com I'm faced with a connection issue on the secondary side. I've checked the firewall and can access the 27017 port from each to machine to the other.

I keep getting this error in my secondary:

Error in heartbeat (...) response status: NetworkInterfaceExceededTimeLimit:
Couldn't get a connection within the time limit

Is there something foul I'm missing? I've tried syncing the date on both machines, disabling firewalls, multiple restarts, and config changes, to no avail.

Here are the replicaset config and rs.status():

PRIMARY

    aireclaimRs:PRIMARY> rs.config()
    {
        "_id" : "aireclaimRs",
        "version" : 3,
        "protocolVersion" : NumberLong(1),
        "members" : [
            {
                "_id" : 0,
                "host" : "vimax.aireclaim.com:27017",
                "arbiterOnly" : false,
                "buildIndexes" : true,
                "hidden" : false,
                "priority" : 2,
                "tags" : {
                    
                },
                "slaveDelay" : NumberLong(0),
                "votes" : 1
            },
            {
                "_id" : 1,
                "host" : "mx.aireclaim.com:27017",
                "arbiterOnly" : false,
                "buildIndexes" : true,
                "hidden" : false,
                "priority" : 1,
                "tags" : {
                    
                },
                "slaveDelay" : NumberLong(0),
                "votes" : 1
            }
        ],
        "settings" : {
            "chainingAllowed" : true,
            "heartbeatIntervalMillis" : 2000,
            "heartbeatTimeoutSecs" : 10,
            "electionTimeoutMillis" : 10000,
            "catchUpTimeoutMillis" : -1,
            "catchUpTakeoverDelayMillis" : 30000,
            "getLastErrorModes" : {
                
            },
            "getLastErrorDefaults" : {
                "w" : 1,
                "wtimeout" : 0
            },
            "replicaSetId" : ObjectId("5be0f8105dfcbe069f5c8533")
        }
    }
    aireclaimRs:PRIMARY> rs.status()
    {
        "set" : "aireclaimRs",
        "date" : ISODate("2020-03-23T20:08:25.719Z"),
        "myState" : 1,
        "term" : NumberLong(20),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
            "lastCommittedOpTime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "appliedOpTime" : {
                "ts" : Timestamp(1584994100, 1),
                "t" : NumberLong(20)
            },
            "durableOpTime" : {
                "ts" : Timestamp(1584994100, 1),
                "t" : NumberLong(20)
            }
        },
        "members" : [
            {
                "_id" : 0,
                "name" : "vimax.aireclaim.com:27017",
                "health" : 1,
                "state" : 2,
                "stateStr" : "SECONDARY",
                "uptime" : 1595,
                "optime" : {
                    "ts" : Timestamp(1584838063, 1),
                    "t" : NumberLong(11)
                },
                "optimeDurable" : {
                    "ts" : Timestamp(1584838063, 1),
                    "t" : NumberLong(11)
                },
                "optimeDate" : ISODate("2020-03-22T00:47:43Z"),
                "optimeDurableDate" : ISODate("2020-03-22T00:47:43Z"),
                "lastHeartbeat" : ISODate("2020-03-23T20:08:24.148Z"),
                "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                "pingMs" : NumberLong(0),
                "configVersion" : 3
            },
            {
                "_id" : 1,
                "name" : "mx.aireclaim.com:27017",
                "health" : 1,
                "state" : 1,
                "stateStr" : "PRIMARY",
                "uptime" : 8376,
                "optime" : {
                    "ts" : Timestamp(1584994100, 1),
                    "t" : NumberLong(20)
                },
                "optimeDate" : ISODate("2020-03-23T20:08:20Z"),
                "electionTime" : Timestamp(1584992519, 1),
                "electionDate" : ISODate("2020-03-23T19:41:59Z"),
                "configVersion" : 3,
                "self" : true
            }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1584994100, 1),
        "$clusterTime" : {
            "clusterTime" : Timestamp(1584994100, 1),
            "signature" : {
                "hash" : BinData(0,"oGCz38lgYjsGTrWc3maAD2vyc6M="),
                "keyId" : NumberLong("6765287379189104641")
            }
        }
    }

SECONDARY

    aireclaimRs:SECONDARY> rs.status()
    {
        "set" : "aireclaimRs",
        "date" : ISODate("2020-03-23T19:56:37.132Z"),
        "myState" : 2,
        "term" : NumberLong(20),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
            "lastCommittedOpTime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "appliedOpTime" : {
                "ts" : Timestamp(1584838063, 1),
                "t" : NumberLong(11)
            },
            "durableOpTime" : {
                "ts" : Timestamp(1584838063, 1),
                "t" : NumberLong(11)
            }
        },
        "members" : [
            {
                "_id" : 0,
                "name" : "vimax.aireclaim.com:27017",
                "health" : 1,
                "state" : 2,
                "stateStr" : "SECONDARY",
                "uptime" : 890,
                "optime" : {
                    "ts" : Timestamp(1584838063, 1),
                    "t" : NumberLong(11)
                },
                "optimeDate" : ISODate("2020-03-22T00:47:43Z"),
                "syncingTo" : "",
                "syncSourceHost" : "",
                "syncSourceId" : -1,
                "infoMessage" : "",
                "configVersion" : 3,
                "self" : true,
                "lastHeartbeatMessage" : ""
            },
            {
                "_id" : 1,
                "name" : "mx.aireclaim.com:27017",
                "health" : 0,
                "state" : 8,
                "stateStr" : "(not reachable/healthy)",
                "uptime" : 0,
                "optime" : {
                    "ts" : Timestamp(0, 0),
                    "t" : NumberLong(-1)
                },
                "optimeDurable" : {
                    "ts" : Timestamp(0, 0),
                    "t" : NumberLong(-1)
                },
                "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
                "lastHeartbeat" : ISODate("2020-03-23T19:56:31.106Z"),
                "lastHeartbeatRecv" : ISODate("2020-03-23T19:56:36.034Z"),
                "pingMs" : NumberLong(0),
                "lastHeartbeatMessage" : "Couldn't get a connection within the time limit",
                "syncingTo" : "",
                "syncSourceHost" : "",
                "syncSourceId" : -1,
                "infoMessage" : "",
                "configVersion" : -1
            }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1584838063, 1),
        "$clusterTime" : {
            "clusterTime" : Timestamp(1584993390, 1),
            "signature" : {
                "hash" : BinData(0,"zINnjtBKXZ14gJrxrUbT2zyurqQ="),
                "keyId" : NumberLong("6765287379189104641")
            }
        }
    }

This is what is happening in the logs:

Primary

tail /var/log/mongodb/mongodb.log
2020-03-23T21:08:09.420+0100 I NETWORK  [conn2630] end connection 144.76.84.5:50114 (2 connections now open)
2020-03-23T21:08:29.420+0100 I NETWORK  [listener] connection accepted from 144.76.84.5:50118 #2631 (3 connections now open)
2020-03-23T21:08:29.420+0100 I NETWORK  [conn2631] end connection 144.76.84.5:50118 (2 connections now open)
2020-03-23T21:08:49.420+0100 I NETWORK  [listener] connection accepted from 144.76.84.5:50122 #2632 (3 connections now open)
2020-03-23T21:08:49.420+0100 I NETWORK  [conn2632] end connection 144.76.84.5:50122 (2 connections now open)
2020-03-23T21:09:09.421+0100 I NETWORK  [listener] connection accepted from 144.76.84.5:50126 #2633 (3 connections now open)
2020-03-23T21:09:09.421+0100 I NETWORK  [conn2633] end connection 144.76.84.5:50126 (2 connections now open)
2020-03-23T21:09:18.717+0100 I NETWORK  [conn2629] end connection 127.0.0.1:60316 (1 connection now open)
2020-03-23T21:09:29.421+0100 I NETWORK  [listener] connection accepted from 144.76.84.5:50130 #2634 (2 connections now open)
2020-03-23T21:09:29.421+0100 I NETWORK  [conn2634] end connection 144.76.84.5:50130 (1 connection now open)

Secondary

tail /var/log/mongodb/mongod.log 
2020-03-23T20:48:13.408+0100 I REPL     [replexec-1] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1)
2020-03-23T20:48:17.592+0100 I REPL_HB  [replexec-1] Error in heartbeat (requestId: 74) to mx.aireclaim.com:27017, response status: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2020-03-23T20:48:19.658+0100 I REPL     [rsBackgroundSync] waiting for 2 pings from other members before syncing
2020-03-23T20:48:24.791+0100 I REPL     [replexec-1] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1)
2020-03-23T20:48:28.092+0100 I REPL_HB  [replexec-1] Error in heartbeat (requestId: 76) to mx.aireclaim.com:27017, response status: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2020-03-23T20:48:29.593+0100 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mx.aireclaim.com:27017 - NetworkInterfaceExceededTimeLimit: Operation timed out
2020-03-23T20:48:29.593+0100 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mx.aireclaim.com:27017
2020-03-23T20:48:34.661+0100 I REPL     [rsBackgroundSync] waiting for 2 pings from other members before syncing
2020-03-23T20:48:35.839+0100 I REPL     [replexec-1] Not starting an election, since we are not electable due to: Not standing for election because I cannot see a majority (mask 0x1)
2020-03-23T20:48:38.593+0100 I REPL_HB  [replexec-1] Error in heartbeat (requestId: 77) to mx.aireclaim.com:27017, response status: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit

So it looks like the primary accepts the connection then ends it.

You would think that there is some conectivity issue between the two machines, but I can :

aireclaim@aireclaim-platform-server:~> mongo mx.aireclaim.com
MongoDB shell version v3.6.10
connecting to: mongodb://mx.aireclaim.com:27017/test?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("de193403-69ca-4f74-a6c5-ee384eb420d7") }
MongoDB server version: 3.6.3
aireclaimRs:PRIMARY> 

and:

root@mx  /home/aireclaim  mongo vimax.aireclaim.com
MongoDB shell version v3.6.3
connecting to: mongodb://vimax.aireclaim.com:27017/test
MongoDB server version: 3.6.10
aireclaimRs:SECONDARY> 

Also I can atest that both machines have port 27017 open and that they are configured to run on those Ips.

In fact the entire thing remains funcional somehow. The clients that atempt to connect to the replicaset manage to do it, but It looks as though the secondary is no longer replicating.

Any insight would be a godsend.

Upvotes: 4

Views: 7352

Answers (1)

Nuno Sousa
Nuno Sousa

Reputation: 882

Updating the mongodb version to 4.0.5 took care of the issue

Upvotes: 1

Related Questions