Vineel
Vineel

Reputation: 1788

MongoDB rollback and replica set issue

We have a three member MongoDB replica set. One of the nodes is trying to do a rollback and fails. As a result this Mongo POD in Crashloop. What is "Stable Timestamp" in MongoDB ? What is "top" of Oplog?

initiate system update and upgrade stuck in mongodb
2021-04-08T06:51:06.690+0000 I STORAGE [initandlisten] WiredTiger record store oplog processing took 554ms
2021-04-08T06:51:06.693+0000 I STORAGE [initandlisten] Timestamp monitor starting
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten]
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten] ** WARNING: You are running on a NUMA machine.
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten] ** We suggest launching mongod like this to avoid performance problems:
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten] ** numactl --interleave=all mongod [other options]
2021-04-08T06:51:06.698+0000 I CONTROL [initandlisten]
2021-04-08T06:51:06.718+0000 I SHARDING [initandlisten] Marking collection local.system.replset as collection version: <unsharded>
2021-04-08T06:51:06.720+0000 I STORAGE [initandlisten] Flow Control is enabled on this deployment.
2021-04-08T06:51:06.720+0000 I SHARDING [initandlisten] Marking collection admin.system.roles as collection version: <unsharded>
2021-04-08T06:51:06.720+0000 I SHARDING [initandlisten] Marking collection admin.system.version as collection version: <unsharded>
2021-04-08T06:51:06.722+0000 I SHARDING [initandlisten] Marking collection local.startup_log as collection version: <unsharded>
2021-04-08T06:51:06.722+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/var/data/mongodb/diagnostic.data'
2021-04-08T06:51:06.723+0000 I SHARDING [initandlisten] Marking collection local.replset.minvalid as collection version: <unsharded>
2021-04-08T06:51:06.723+0000 I SHARDING [initandlisten] Marking collection local.replset.election as collection version: <unsharded>
2021-04-08T06:51:06.725+0000 I REPL [initandlisten] Rollback ID is 2
2021-04-08T06:51:06.726+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1617686958, 1) (top of oplog:
{ ts: Timestamp(1617686416, 1), t: 32 }
, appliedThrough:
{ ts: Timestamp(1617686958, 1), t: 36 }
, TruncateAfter: Timestamp(0, 0))
2021-04-08T06:51:06.726+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1617686958, 1)
2021-04-08T06:51:06.726+0000 F REPL [initandlisten] Applied op { : Timestamp(1617686958, 1) } not found. Top of oplog is { : Timestamp(1617686416, 1) }.
2021-04-08T06:51:06.726+0000 F - [initandlisten] Fatal Assertion 40313 at src/mongo/db/repl/replication_recovery.cpp 511
2021-04-08T06:51:06.726+0000 F - [initandlisten] \n\n***aborting after fassert() failure\n\n

Update: Precisely I am trying to understand, what is the stable timestamp being referred to here. why would top of oplog matter?

Recovering from stable timestamp: Timestamp(1617686958, 1) (top of oplog: { ts: Timestamp(1617686416, 1), t: 32 } , appliedThrough: { ts: Timestamp(1617686958, 1), t: 36 } , TruncateAfter: Timestamp(0, 0))

Why does the assertion fail here?

2021-04-08T06:51:06.726+0000 F REPL [initandlisten] Applied op { : Timestamp(1617686958, 1) } not found. Top of oplog is { : Timestamp(1617686416, 1) }.
2021-04-08T06:51:06.726+0000 F - [initandlisten] Fatal Assertion 40313 at src/mongo/db/repl/replication_recovery.cpp 511

Upvotes: 0

Views: 1898

Answers (1)

JJussi
JJussi

Reputation: 1580

System tries to make rollback, but needed entries are not in that opLog.

It would be better let that node do full initialization. Just remove all data-directory files and start mongod. The node will automatically replicate all data from the nearest fully synced node.

Upvotes: 4

Related Questions