Reputation: 3459

MongoDB slows down every 2 hours and 10 minutes accurately

In the past 3 months, my MongoDB server getting very slow every 2 hours and 10 minutes, very accurate.

My Server configuration:

3 replica set, and for the purpose of data backup, 1 of them has 3600 seconds delay.
No slave servers to the 3 masters in the replica set.
Use mongoose + node.js to provide rest api.
About 9 reads and 1.5 writes per second in average in the 24 hours statistics data.

What I did after searching stackoverflow and google:

Restart the server CANNOT change the slow interval 2 hours and 10 minutes
Create index to all the fields I query, no impact
Delete data file in one server and use another one to recovery, then delete anohter and recovery back, no impact
Shift primary server, no impact
Run 'currentOps' when the database is slow, I can see a lot of query hung there, too many logs to paste here, but didn't see some abnormal query.
In mongo console, check "serverStatus" when the database is slow, the command waiting until the database is recovered.
No memory usage increase from "top" command when database is slow.
rest api which does not access database works well.

I guess there might be something locking, the most potential cause is that it may be building index. There are something special in my database:

I have about 14000 collections in one database, and is increasing. There may be 1 to 3000 records in one collections.
Both the number of collections and the number records are increasing dynamically.
Index fields will be specified when creating new collection.

I have been obsessed by this issue for 3 months. Any comments/suggestions will be highly appreciated!

Here are some logs from my log file:

Fri Jul 5 15:20:11.040 [conn2765] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 222694, after repl: 222694, at end: 222694 }

Fri Jul 5 17:30:09.367 [conn4711] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 199498, after repl: 199498, at end: 199528 }

Fri Jul 5 19:40:12.697 [conn6488] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 204061, after repl: 204061, at end: 204081 }

Here are the screen shot of my pingdom report, the server down 4 minutes every 2 hours and 7 minutes. In the beginning, the server down 2 minutes every 2 hours and 6 minutes. report from pingdom

[EDIT 1] More monitor result from host provider: CPU http://i.minus.com/iZBNyMPzLSLRr.png DiskIO http://i.minus.com/ivgrHr0Ghoz92.png Connections http://i.minus.com/itbfYq0SSMlNs.png The periodically increased connections is because connections are waiting, and the count for current connection will accumulate until database is unblocked. This is not because of huge traffic.

Upvotes: 9

Answers (3)

JAR.JAR.beans

Reputation: 10034

We found a specific 2:10 issue. In our case, it was an execution of dbStats by MMS. We had to upgrade the cluter and the issue got resolved.

Upvotes: 3

kizzx2

Reputation: 19213

I've had a similar issue. I'd start with mongostat / mongotop and work your way from there. Identify the predominant workload with mongostat, and then find out which collection is causing that activity.

For my particular case, I have a cron job that deletes obsoletes records. It turns out that the way the replica set propagates this commands is extremely resource intensive. For example, I would delete 3m records from a collection, that happens on the replica set master. For some reason, this propagation makes all secondaries work intensively in the subsequent propagation.

If you can see things in db.currentOp, I would focus on the ones that have long running time and try to pinpoint the root cause by elimination from there.

Hope that helps.

Upvotes: 2

Comtaler

Reputation: 1630

I think you mean a replicaset with 3 nodes instead of "3 replica set".

If you are still experiencing the same issue. Here is my opinion:

Since you are running your server in linode.com. Your server is actually a virtual machine and you are sharing resources with others. The periodic slow down may be due to others running having disk load periodically. Since you have looked into so many different possibilities already, this may be an option for you even it takes a bit of effort.
This is definitely caused by a job ran by mongodb or your system. Please try to look for any job that runs regularly. For example, try to remove the 3600 seconds delay on one of your secondary. Even that is not 2 hour and 10 minutes, but that may be a trigger of it.

I can't post my suggestions in the comment since it doesn't allow me to. So, I am posting this as an answer.

Upvotes: 1

MongoDB slows down every 2 hours and 10 minutes accurately

Answers (3)

Related Questions