Reputation: 635
My system is a TomCat 7 server running on Ubuntu talking to a MongoDB cluster running in CentOS. We have this on AWS and it is working just fine.
I recently brought up the exact same thing on Azure and we are having constant, seemingly random timeouts when the tomcat app tries to query MongoDB. A typical error is:
Jan 31 08:13:54 catalina.out: Jan 31, 2014 4:14:09 PM com.mongodb.DBPortPool gotError
Jan 31 08:13:54 catalina.out: WARNING: emptying DBPortPool to xxx.cloudapp.net/xxx.xxx.xxx.xxx:21191 b/c of error
Jan 31 08:13:54 catalina.out: java.net.SocketException: Connection timed out
Jan 31 08:13:54 catalina.out: at java.net.SocketInputStream.socketRead0(Native Method)
Jan 31 08:13:54 catalina.out: at java.net.SocketInputStream.read(SocketInputStream.java:146)
Jan 31 08:13:54 catalina.out: at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
Jan 31 08:13:54 catalina.out: at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
Jan 31 08:13:54 catalina.out: at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
Jan 31 08:13:54 catalina.out: at org.bson.io.Bits.readFully(Bits.java:46)
Jan 31 08:13:54 catalina.out: at org.bson.io.Bits.readFully(Bits.java:33)
Jan 31 08:13:54 catalina.out: at org.bson.io.Bits.readFully(Bits.java:28)
Jan 31 08:13:54 catalina.out: at com.mongodb.Response.<init>(Response.java:40)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBPort.go(DBPort.java:142)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBPort.call(DBPort.java:92)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:244)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:216)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:288)
Jan 31 08:13:54 catalina.out: at com.mongodb.DB.command(DB.java:262)
Jan 31 08:13:54 catalina.out: at com.mongodb.DB.command(DB.java:244)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBCollection.getCount(DBCollection.java:985)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBCollection.getCount(DBCollection.java:956)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBCollection.getCount(DBCollection.java:931)
Jan 31 08:13:54 catalina.out: at com.mongodb.DBCollection.count(DBCollection.java:878)
Jan 31 08:13:54 catalina.out: at com.eweware.service.base.store.impl.mongo.dao.BaseDAOImpl._exists(BaseDAOImpl.java:788)
Jan 31 08:13:54 catalina.out: at com.eweware.service.base.store.impl.mongo.dao.GroupDAOImpl._exists(GroupDAOImpl.java:18)
I am using Java driver 2.11.4 and initializing it as follows:
builder.autoConnectRetry(true)
.connectionsPerHost(10)
.writeConcern(WriteConcern.FSYNCED)
.connectTimeout(30000)
.socketKeepAlive(true);
In reading around the interwebs I saw some materials suggesting that there is an Azure issue and some C# suggestions but I have not seen anything on how to correct it from Java.
Some more details:
Here is the retry code I am trying:
private DBObject findOneRetry(DBObject criteria, DBObject fields, DBCollection collection) throws SystemErrorException { DBObject obj = null; for (int attempt = 1; attempt < MAX_RETRIES; attempt++) { try { obj = collection.findOne(criteria, fields); // getting SocketException inside here return obj; } catch (Exception e) { if (attempt > MAX_RETRIES) { throw new SystemErrorException(makeErrorMessage("findOneRetry", "find", attempt, e, null), e, ErrorCodes.SERVER_DB_ERROR); } else { logger.warning(getClass().getName() + ": findOneRetry failed and will retry in attempt #" + attempt + " in collection " + _getCollection()); } } } return obj; }
Any suggestions on how to correct?
Thanks in advance!
Upvotes: 4
Views: 1297
Reputation: 684
This is due to high tcp_keepalive_time in your CentOS server.
In your MongoS server: sudo nano /proc/sys/net/ipv4/tcp_keepalive_time Change 7200 to 60
Restart your Azure instance.
Update: In order to make sure your VM always has the value of tcp_keepalive_time:
Add this line:
bash -c 'echo 60 > /proc/sys/net/ipv4/tcp_keepalive_time'
to:
/etc/rc.d/rc.local
Update to the update:
For most flavors for Linux, there is a /etc/sysctl.d/
directory. Create a file, e.g. mongo.conf
containing:
net.ipv4.tcp_keepalive_time = 60
Put the file in that directory and run:
sysctl -p /etc/sysctl.d/mongo.conf
Verify the change with:
sysctl net.ipv4.tcp_keepalive_time
This will survive reboots in RedHat-based and Debian-based systems I've come across. You'll want to be sure and check /etc/sysctl.conf
and any other files in /etc/sysctl.d
to see if the variable is already being set to something else and make appropriate changes.
Upvotes: 5