How to specify socket timeout for a MongoDB server slave replica

I get socket error 110 (Connection timed out) when a Mongo database (version 3.0.5) is replicated from primary DB server to slave, more precisely at the time of committing replication of that database (the log of slave is below). I guess probably the reason for that is that the database is big and send operation to commit it takes too much time.

How can I specify different socket timeout for mongo server? If its not possible, is there any other way to repair replication?

I found such an option only for a mongo client (connection string option socketTimeoutMS) but it doesn't help with Mongo server.

2016-04-26T13:36:34.693+0100 I INDEX    [rsSync]         done building bottom layer, going to commit     
2016-04-26T13:36:34.693+0100 I INDEX [rsSync] build index done.  scanned 30980334 total records. 4072 secs    
2016-04-26T13:36:34.772+0100 I REPL     [rsSync] initial sync cloning db: {skipped db name}    
2016-04-26T13:36:34.823+0100 I NETWORK  [rsSync] Socket say send() errno:110 Connection timed out {skipped ip}:27017    
2016-04-26T13:36:34.828+0100 E REPL     [rsSync] 9001 socket exception [SEND_ERROR] server [{skipped ip}:27017]     
2016-04-26T13:36:34.828+0100 E REPL     [rsSync] initial sync attempt failed, 9 attempts remaining

Update. I was asked for output of rs.status() in comments:

{       "set" : "<skippedsetname>",
        "date" : ISODate("2016-05-04T15:35:06.717Z"),
        "myState" : 5,
        "syncingTo" : "<skipped domain name of other server>:27017",
        "members" : [
                {
                        "_id" : 0,
                        "name" : "<skipped domain name of this server>:27017",
                        "health" : 1,
                        "state" : 5,
                        "stateStr" : "STARTUP2",
                        "uptime" : 29,
                        "optime" : Timestamp(0, 0),
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "syncingTo" : "<skipped domain name of other server>:27017",
                        "configVersion" : 9,
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "10.0.1.7:27017",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 26,
                        "lastHeartbeat" : ISODate("2016-05-04T15:35:05.859Z"),
                        "lastHeartbeatRecv" : ISODate("2016-05-04T15:35:06.347Z"),
                        "pingMs" : 3,
                        "configVersion" : 9
                },
                {
                        "_id" : 3,
                        "name" : "<skipped domain name of other server>:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 26,
                        "optime" : Timestamp(1462376105, 196),
                        "optimeDate" : ISODate("2016-05-04T15:35:05Z"),
                        "lastHeartbeat" : ISODate("2016-05-04T15:35:05.859Z"),
                        "lastHeartbeatRecv" : ISODate("2016-05-04T15:35:06.086Z"),
                        "pingMs" : 4,
                        "electionTime" : Timestamp(1461688501, 1),
                        "electionDate" : ISODate("2016-04-26T16:35:01Z"),
                        "configVersion" : 9
                }
        ],
        "ok" : 1    }

Update. I should but didn't mention hosting used is Azure. Answer and explanation is perfectly googled by query "azure mongodb connection timeout". My bad.

Upvotes: 6

Answers (2)

Héctor Valverde

Reputation: 1103

There probably are some files locking the filesystem in your slave. If I where you, I'd remove the node from the replica, then wipe all files under dbpath, check the mongo user can access this directory and restart mongod. Once it's running, add it back to the RS and wait for it. See also: https://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/#mongod-lock

Upvotes: 0

Soren

Reputation: 14698

Your assumption of the cause of the error is wrong.

Connection timed out: During the attempt to establish the TCP connection, no response came from the other side within a given time limit.

In other words, it is a issue in the establishment of the socket and not a question of how long it takes to make the replication of the database.

Tuning the TCP timeout is a system setting and not something you do per application. The settings, on linux, are in the system-wide /etc/sysctl.conf and you can play around with the net.ipv4.tcp_syn_retries -- However you almost never change the timeout for establishing a socket (for any program, including mongo), and the few times I have changed it it was to make it shorter to get the error faster, rather than increasing it -- increasing it is unlikely to be the right solution in any earthly application.

The problem is either a configuration problem -- like you have some bad IP addresses in your setup, or a networking problem, like a bad firewall, routing table or a network switch which sometimes doesn't work for 60-120 seconds at a time.

Upvotes: 4

How to specify socket timeout for a MongoDB server slave replica

Answers (2)

Related Questions