Kiran Pamula
Kiran Pamula

Reputation: 11

mongod 4.4 CPU hogs/stalls leading to client queries timedout

``Hi Team,

Since migrating to MongoDB 4.4, we have been encountering a recurring issue of client queries timing out frequently. Upon monitoring the CPU usage during these occurrences, we have noticed that mongod's CPU usage is unusually high, despite there not being a significant increase in operations.

This erratic CPU usage has been observed intermittently, and our application has a monitoring thread that checks the replicStatus and reports NO_PRIMARY during such times.

Mongo- Server Version: 4.4.18

Mongo Client Java Driver version: 3.12.9

Standalone or Replica : 4 data bearing + 1 arbiter

Server Option:

root 52039 161 2.5 12649240 7229872 ? S<Ll Apr06 4415:45 /usr/bin/mongod --ipv6 --slowms 500 --storageEngine wiredTiger --wiredTigerCacheSizeGB 22 --enableMajorityReadConcern false --bind_ip_all --keyFile /root/.dbkey --port 27717 --dbpath=/var/data/sessions.1/h --replSet set01h --fork --pidfilepath /var/run/sessionmgr-27717.pid --oplogSize 5120 --logpath /var/log/mongodb-27717.log --logappend --quiet

Storage Engine : WT

Application Monitors the replicaStatus periodically reporting, TYPE=UNKNOWN for PRIMARY intermittently.

2023-04-07 23:02:30,944 [pool-2-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion

{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=795436, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414267859844}, ServerDescription{address=nd2bwa3psm15va:27717, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]} , minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1152849, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414649210810}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion

{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=943462, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414619113856}]}}

Immediately in the next sec, the replicaSet detects PRIMARY

2023-04-07 23:02:31,260 [pool-2-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]} , minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=772997, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116414767866623}, ServerDescription{address=nd2bwa3psm15va:27717, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=16803136, setName='set01h', canonicalAddress=nd2bwa3psm15va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=7fffffff0000000000000080, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116414870305431}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion

{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1152849, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414649210810}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]} , minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=943462, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:30 UTC 2023, lastUpdateTimeNanos=116414619113856}]}}

But again moves to UKNOWN and later to a proper state

2023-04-07 23:02:31,544 [pool-2-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion

{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=729934, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415267757469}, ServerDescription{address=nd2bwa3psm15va:27717, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]} , minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1059673, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415149889939}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion

{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=941532, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415120136591}]}}

2023-04-07 23:02:31,764 [pool-3-thread-1] replicaset status: ReplicaSetStatus{name=set01h, cluster=ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=nd2bwa3psm16vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]} , minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=729934, setName='set01h', canonicalAddress=nd2bwa3psm16vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415267757469}, ServerDescription{address=nd2bwa3psm15va:27717, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=13549634, setName='set01h', canonicalAddress=nd2bwa3psm15va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=7fffffff0000000000000080, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415370985974}, ServerDescription{address=nd2bwa3psm16va:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion

{versionList=[4, 4, 18]}, minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1059673, setName='set01h', canonicalAddress=nd2bwa3psm16va:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415149889939}, ServerDescription{address=nd2bwa3psm15vb:27717, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 4, 18]} , minWireVersion=0, maxWireVersion=9, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=941532, setName='set01h', canonicalAddress=nd2bwa3psm15vb:27717, hosts=[nd2bwa3psm16va:27717, nd2bwa3psm15va:27717, nd2bwa3psm16vb:27717, nd2bwa3psm15vb:27717], passives=[], arbiters=[], primary='nd2bwa3psm15va:27717', tagSet=TagSet{[]}, electionId=null, setVersion=27149688, lastWriteDate=Fri Apr 07 23:02:31 UTC 2023, lastUpdateTimeNanos=116415120136591}]}}

top - 23:02:27 up 1 day, 7:17, 0 users, load average: 21.07, 23.62, 26.89 Tasks: 303 total, 1 running, 301 sleeping, 0 stopped, 1 zombie %Cpu(s): 56.5 us, 15.2 sy, 0.0 ni, 22.7 id, 0.0 wa, 2.1 hi, 3.5 si, 0.0 st MiB Mem : 273809.9 total, 204977.1 free, 30793.4 used, 38039.4 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 205453.0 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

52039 root 5 -15 12.0g 7.2g 36920 S 237.7 2.7 2977:27 mongod

– top - 23:02:30 up 1 day, 7:17, 0 users, load average: 21.07, 23.62, 26.89 Tasks: 301 total, 2 running, 299 sleeping, 0 stopped, 0 zombie %Cpu(s): 87.8 us, 4.8 sy, 0.0 ni, 5.3 id, 0.0 wa, 0.9 hi, 1.2 si, 0.0 st MiB Mem : 273809.9 total, 204977.1 free, 30793.4 used, 38039.4 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 205453.0 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

52039 root 5 -15 12.1g 7.3g 36920 S 999.9 2.7 2977:48 mongod

PID : 52039 mongostats insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time 65 1449 969 68 701 821|0 1.9% 30.2% 0 12.0G 7.19G 0|1 1|1 5.79m 14.8m 741 set01h PRI Apr 7 23:02:21.449 71 1572 1063 76 870 959|0 1.7% 30.2% 0 12.0G 7.20G 0|0 2|0 6.45m 16.2m 744 set01h PRI Apr 7 23:02:22.426 69 1581 1064 74 841 988|0 1.8% 30.2% 0 12.0G 7.20G 0|0 1|0 6.53m 16.2m 744 set01h PRI Apr 7 23:02:23.423 88 1602 1090 77 793 870|0 1.8% 30.3% 0 12.0G 7.20G 0|0 1|1 6.55m 16.5m 741 set01h PRI Apr 7 23:02:24.431 67 1507 1026 78 954 1013|0 1.9% 30.1% 0 12.0G 7.20G 0|0 1|1 6.44m 15.8m 741 set01h PRI Apr 7 23:02:25.427 59 1601 1122 86 778 817|0 1.9% 30.1% 0 12.0G 7.19G 0|1 1|0 6.48m 16.6m 741 set01h PRI Apr 7 23:02:26.419 81 1609 1068 73 779 908|0 2.0% 30.2% 0 12.0G 7.19G 0|1 1|0 6.46m 16.3m 741 set01h PRI Apr 7 23:02:27.435 15 417 189 32 225 364|0 1.9% 30.1% 0 11.9G 7.31G 0|0 2|8 1.44m 2.93m 612 set01h PRI Apr 7 23:02:30.352 *0 184 *0 9 18 2643|0 1.9% 30.2% 0 11.8G 7.32G 0|0 1|1 318k 2.89m 413 set01h PRI Apr 7 23:02:30.564 140 1499 809 94 364 6105|0 2.0% 30.2% 0 11.6G 7.31G 0|0 1|5 5.90m 17.7m 292 set01h PRI Apr 7 23:02:31.422 insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl

Upvotes: 1

Views: 154

Answers (1)

R2D2
R2D2

Reputation: 10737

  1. According to the compatibility matrix your client java driver version 3.12.9 is not compatible with mongod version 4.4 see here (The Driver version will work with the MongoDB version, but not all new MongoDB features are supported) , you will need to upgrade to at least java driver version 4.1.
  1. What do you see when you execute "top -H PID(mongod)" when your mongod is using heavily your CPU?

  2. Provide some more details about your database schema , size and executed update,delete,find queries?

Upvotes: 0

Related Questions