Dushyant Bangal
Dushyant Bangal

Reputation: 6403

mongodb service crashes after saying serverStatus was very slow

I have a mongodb service on EC2. It automatically crashes after sometime. When I did systemctl status mongodb, it gave me the following output:

● mongodb.service - High-performance, schema-free document-oriented database Loaded: loaded (/etc/systemd/system/mongodb.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Fri 2017-02-17 02:00:00 UTC; 3h 37min ago Main PID: 1152 (code=killed, signal=KILL)

Feb 17 02:00:00 ip-172-31-29-240 systemd[1]: mongodb.service: Main process exited, code=killed, status=9/KILL Feb 17 02:00:00 ip-172-31-29-240 systemd[1]: mongodb.service: Unit entered failed state. Feb 17 02:00:00 ip-172-31-29-240 systemd[1]: mongodb.service: Failed with result 'signal'. Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

When I checked the /var/log/mongodb/mongod.log file, I got this:

2017-02-20T11:11:05.624+0000 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] MongoDB starting : pid=29177 port=27017 dbpath=/var/lib/mongodb 64-bit host=ip-172-31-29-240
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] db version v3.2.11
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] git version: 009580ad490190ba33d1c6253ebd8d91808923e4
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] modules: none
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] build environment:
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-02-20T11:11:05.638+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "127.0.0.1", port: 27017 }, storage: { dbPath: "/var/lib/mongodb", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log", quiet: true } }
2017-02-20T11:11:05.665+0000 I -        [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-02-20T11:11:05.665+0000 W -        [initandlisten] Detected unclean shutdown - /var/lib/mongodb/mongod.lock is not empty.
2017-02-20T11:11:05.665+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-02-20T11:11:05.665+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] 
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] 
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-02-20T11:11:06.246+0000 I CONTROL  [initandlisten] 
2017-02-20T11:11:06.280+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/var/lib/mongodb/diagnostic.data'
2017-02-20T11:11:06.280+0000 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2017-02-20T11:11:06.283+0000 I NETWORK  [initandlisten] waiting for connections on port 27017
2017-02-20T11:11:28.189+0000 I COMMAND  [conn3] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { myFlag: true, timestamp: { $lt: new Date(1485907200000) } } }, { $group: { _id: null, startDate: { $min: "$timestamp" }, previousCount: { $sum: 1 } } }, { $project: { _id: 0, startDate: 1, previousCount: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:236 reslen:114 locks:{ Global: { acquireCount: { r: 478 } }, Database: { acquireCount: { r: 239 } }, Collection: { acquireCount: { r: 239 } } } protocol:op_query 2605ms
2017-02-20T11:11:28.192+0000 I COMMAND  [conn8] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { myFlag: true, timestamp: { $lt: new Date(1485907200000) } } }, { $group: { _id: null, startDate: { $min: "$timestamp" }, previousCount: { $sum: 1 } } }, { $project: { _id: 0, startDate: 1, previousCount: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:244 reslen:114 locks:{ Global: { acquireCount: { r: 494 } }, Database: { acquireCount: { r: 247 } }, Collection: { acquireCount: { r: 247 } } } protocol:op_query 2764ms
2017-02-20T11:11:28.235+0000 I COMMAND  [conn5] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910685417) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:244 reslen:538 locks:{ Global: { acquireCount: { r: 494 } }, Database: { acquireCount: { r: 247 } }, Collection: { acquireCount: { r: 247 } } } protocol:op_query 2817ms
2017-02-20T11:11:28.303+0000 I COMMAND  [conn8] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910685581) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 110ms
2017-02-20T11:11:28.438+0000 I COMMAND  [conn3] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910688306) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 129ms
2017-02-20T11:11:28.444+0000 I COMMAND  [conn5] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910688272) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 154ms
2017-02-20T11:12:15.504+0000 I COMMAND  [conn1] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910735374) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:143 reslen:538 locks:{ Global: { acquireCount: { r: 292 } }, Database: { acquireCount: { r: 146 } }, Collection: { acquireCount: { r: 146 } } } protocol:op_query 128ms
2017-02-20T11:12:15.582+0000 I COMMAND  [conn5] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910735439) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:145 reslen:538 locks:{ Global: { acquireCount: { r: 296 } }, Database: { acquireCount: { r: 148 } }, Collection: { acquireCount: { r: 148 } } } protocol:op_query 140ms
2017-02-20T11:12:15.588+0000 I COMMAND  [conn8] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { timestamp: { $gte: new Date(1484910735428) } } }, { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }, count: { $sum: 1 } } }, { $sort: { _id: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:538 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 155ms
2017-02-20T11:12:24.765+0000 I COMMAND  [conn3] query myDataBase.myCollection query: { $query: {}, orderby: { timestamp: -1 } } planSummary: COLLSCAN, COLLSCAN cursorid:47222766812 ntoreturn:200 ntoskip:0 keysExamined:0 docsExamined:18423 hasSortStage:1 keyUpdates:0 writeConflicts:0 numYields:145 nreturned:37 reslen:4578802 locks:{ Global: { acquireCount: { r: 292 } }, Database: { acquireCount: { r: 146 } }, Collection: { acquireCount: { r: 146 } } } 177ms
2017-02-20T11:12:25.268+0000 I COMMAND  [conn3] getmore myDataBase.myCollection planSummary: COLLSCAN, COLLSCAN cursorid:47222766812 ntoreturn:200 keysExamined:0 docsExamined:18423 keyUpdates:0 writeConflicts:0 numYields:151 nreturned:200 reslen:2664936 locks:{ Global: { acquireCount: { r: 304 } }, Database: { acquireCount: { r: 152 } }, Collection: { acquireCount: { r: 152 } } } 321ms
2017-02-20T11:12:25.518+0000 I COMMAND  [conn3] command myDataBase.myCollection command: aggregate { aggregate: "myCollection", pipeline: [ { $match: { myFlag: true, timestamp: { $lt: new Date(1485907200000) } } }, { $group: { _id: null, startDate: { $min: "$timestamp" }, previousCount: { $sum: 1 } } }, { $project: { _id: 0, startDate: 1, previousCount: 1 } } ] } keyUpdates:0 writeConflicts:0 numYields:144 reslen:114 locks:{ Global: { acquireCount: { r: 294 } }, Database: { acquireCount: { r: 147 } }, Collection: { acquireCount: { r: 147 } } } protocol:op_query 173ms
2017-02-20T11:12:25.691+0000 I COMMAND  [conn3] command myDataBase.compressionstats command: aggregate { aggregate: "compressionstats", pipeline: [ { $group: { _id: null, originalDataSum: { $sum: "$originalDataSize" }, compressedDataSum: { $sum: "$compressedDataSize" } } } ] } keyUpdates:0 writeConflicts:0 numYields:2 reslen:125 locks:{ Global: { acquireCount: { r: 10 } }, Database: { acquireCount: { r: 5 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_query 127ms
2017-02-20T11:12:26.525+0000 I WRITE    [conn3] update myDataBase.sessions query: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH" } update: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH", session: "{"cookie":{"originalMaxAge":null,"expires":null,"httpOnly":true,"path":"/"},"passport":{"user":"587cc6630bd9ff5c39e54e94"}}", expires: new Date(1488798745083) } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:1 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } 601ms
2017-02-20T11:12:26.532+0000 I COMMAND  [conn2] command secondDB.secondCollection command: getMore { getMore: 16340341641, collection: "secondCollection", batchSize: 1000 } planSummary: COLLSCAN cursorid:16340341641 keysExamined:0 docsExamined:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:0 reslen:103 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 1372ms
2017-02-20T11:12:26.570+0000 I COMMAND  [conn3] command myDataBase.$cmd command: update { update: "sessions", writeConcern: { w: 1 }, ordered: true, updates: [ { q: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH" }, u: { _id: "doOtZATLM5R61__wpN-PmFH_80BMIpFH", session: "{"cookie":{"originalMaxAge":null,"expires":null,"httpOnly":true,"path":"/"},"passport":{"user":"587cc6630bd9ff5c39e54e94"}}", expires: new Date(1488798745083) }, multi: false, upsert: true } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_query 702ms
2017-02-20T11:12:26.905+0000 I COMMAND  [conn3] query myDataBase.users query: { _id: ObjectId('587cc6630bd9ff5c39e54e94') } planSummary: IDHACK ntoskip:0 keysExamined:1 docsExamined:1 idhack:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:1 reslen:524 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } 204ms
2017-02-20T11:12:27.126+0000 I COMMAND  [conn3] killcursors myDataBase.myCollection keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 86ms
2017-02-20T11:12:33.002+0000 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 120, after asserts: 170, after connections: 370, after extra_info: 580, after globalLock: 690, after locks: 990, after network: 1070, after opcounters: 1170, after opcountersRepl: 1240, after storageEngine: 1680, after tcmalloc: 1910, after wiredTiger: 2390, at end: 2820 }

NOTE: The service status output and log are from different days, however, the issue was same, I just didn't copy the service status output this time.

Upvotes: 8

Views: 8566

Answers (3)

Julius Š.
Julius Š.

Reputation: 174

The issue we were facing was the fact that our storage.dbpath value was set to be living on a mounted drive which was mounted without a flag named noatime which is recommended in the production operations checklist.

Upvotes: 0

KayV
KayV

Reputation: 13835

The root cause of the problem could be some slow queries running on the mongo.

so to find out the slow queries, follow these steps:

  1. Login to the mongo console.

  2. Use your db.

  3. Find queries or operations that are slow using the following command:

    db.currentOp({"secs_running": {$gte: 3}})

    So, this will give a result of queries which are taking more than 3 seconds to execute. The output could be like this:

    rs:SECONDARY> db.currentOp({"secs_running": {$gte: 3}});
    {
    "inprog" : [
        {
            "desc" : "tttt",
            "threadId" : "140558301292288",
            "connectionId" : 307551,
            "client" : "101.01.01.11:49128",
            "appName" : "MongoDB Shell",
            "clientMetadata" : {
                "application" : {
                    "name" : "MongoDB Shell"
                },
                "driver" : {
                    "name" : "MongoDB Internal Client",
                    "version" : "3.0.5-18-g7e327a9"
                },
                "os" : {
                    "type" : "Darwin",
                    "name" : "Mac OS X",
                    "architecture" : "x86_64",
                    "version" : "19.3.0"
                }
            },
            "active" : true,
            "opid" : 912509900,
            "secs_running" : 7711,
            "op" : "query",
            "ns" : "mydb.mytable",
            "query" : {
                "find" : "mytable",
                "filter" : {
                    "name" : {
                        "$type" : 3
                    }
                }
                ]
            },
            "planSummary" : "COLLSCAN",
            "numYields" : 333308,
            "locks" : {
                "Global" : "r",
                "Database" : "r",
                "Collection" : "r"
            },
            "waitingForLock" : false,
            "lockStats" : {
                "Global" : {
                    "acquireCount" : {
                        "r" : NumberLong(666618)
                    },
                    "acquireWaitCount" : {
                        "r" : NumberLong(254219)
                    },
                    "timeAcquiringMicros" : {
                        "r" : NumberLong("3454835364")
                    }
                },
                "Database" : {
                    "acquireCount" : {
                        "r" : NumberLong(333309)
                    }
                },
                "Collection" : {
                    "acquireCount" : {
                        "r" : NumberLong(333309)
                    }
                }
            }
        }
    ],
    "ok" : 1
    

    }

  4. So now you can kill these operations which are taking long time, by selecting the "opid" from above list as follows:

    db.killOp(912509900)
    
  5. Hope fully the overall system load will come down. And you will not be experiencing "serverStatus was very slow" errors again.

  6. And you System should not crash as well

Upvotes: 0

Gerd
Gerd

Reputation: 2603

I had the same problem. After increasing memory the error disappear.

Upvotes: 4

Related Questions