Reputation: 1776
I get socket error 110 (Connection timed out) when a Mongo database (version 3.0.5) is replicated from primary DB server to slave, more precisely at the time of committing replication of that database (the log of slave is below). I guess probably the reason for that is that the database is big and send operation to commit it takes too much time.
How can I specify different socket timeout for mongo server? If its not possible, is there any other way to repair replication?
I found such an option only for a mongo client (connection string option socketTimeoutMS) but it doesn't help with Mongo server.
2016-04-26T13:36:34.693+0100 I INDEX [rsSync] done building bottom layer, going to commit
2016-04-26T13:36:34.693+0100 I INDEX [rsSync] build index done. scanned 30980334 total records. 4072 secs
2016-04-26T13:36:34.772+0100 I REPL [rsSync] initial sync cloning db: {skipped db name}
2016-04-26T13:36:34.823+0100 I NETWORK [rsSync] Socket say send() errno:110 Connection timed out {skipped ip}:27017
2016-04-26T13:36:34.828+0100 E REPL [rsSync] 9001 socket exception [SEND_ERROR] server [{skipped ip}:27017]
2016-04-26T13:36:34.828+0100 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining
Update. I was asked for output of rs.status() in comments:
{ "set" : "<skippedsetname>",
"date" : ISODate("2016-05-04T15:35:06.717Z"),
"myState" : 5,
"syncingTo" : "<skipped domain name of other server>:27017",
"members" : [
{
"_id" : 0,
"name" : "<skipped domain name of this server>:27017",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 29,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"syncingTo" : "<skipped domain name of other server>:27017",
"configVersion" : 9,
"self" : true
},
{
"_id" : 2,
"name" : "10.0.1.7:27017",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 26,
"lastHeartbeat" : ISODate("2016-05-04T15:35:05.859Z"),
"lastHeartbeatRecv" : ISODate("2016-05-04T15:35:06.347Z"),
"pingMs" : 3,
"configVersion" : 9
},
{
"_id" : 3,
"name" : "<skipped domain name of other server>:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 26,
"optime" : Timestamp(1462376105, 196),
"optimeDate" : ISODate("2016-05-04T15:35:05Z"),
"lastHeartbeat" : ISODate("2016-05-04T15:35:05.859Z"),
"lastHeartbeatRecv" : ISODate("2016-05-04T15:35:06.086Z"),
"pingMs" : 4,
"electionTime" : Timestamp(1461688501, 1),
"electionDate" : ISODate("2016-04-26T16:35:01Z"),
"configVersion" : 9
}
],
"ok" : 1 }
Update. I should but didn't mention hosting used is Azure. Answer and explanation is perfectly googled by query "azure mongodb connection timeout". My bad.
Upvotes: 6
Views: 3035
Reputation: 1103
There probably are some files locking the filesystem in your slave. If I where you, I'd remove the node from the replica, then wipe all files under dbpath
, check the mongo user can access this directory and restart mongod
. Once it's running, add it back to the RS and wait for it. See also: https://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/#mongod-lock
Upvotes: 0
Reputation: 14698
Your assumption of the cause of the error is wrong.
Connection timed out
: During the attempt to establish the TCP connection, no response came from the other side within a given time limit. In other words, it is a issue in the establishment of the socket and not a question of how long it takes to make the replication of the database.
Tuning the TCP timeout is a system setting and not something you do per application. The settings, on linux, are in the system-wide /etc/sysctl.conf and you can play around with the net.ipv4.tcp_syn_retries
-- However you almost never change the timeout for establishing a socket (for any program, including mongo), and the few times I have changed it it was to make it shorter to get the error faster, rather than increasing it -- increasing it is unlikely to be the right solution in any earthly application.
The problem is either a configuration problem -- like you have some bad IP addresses in your setup, or a networking problem, like a bad firewall, routing table or a network switch which sometimes doesn't work for 60-120 seconds at a time.
Upvotes: 4