Reputation: 1172
I want to replicate a collection from mongodb into some other data sink. One approach is to get the last oplog entry timestamp, record it, then do a find on the entire collection and iterate over that cursor sending documents to the sink. Once that cursor is exhausted, I read the oplog operations since the timestamp I got at the beginning. (something like what mongo-connector does) In pseudo code:
lastTs = getLatestOplogTimestamp(); -> t1
doDumpOfCollection(collectionName);
streamOperationsFromOlog(since=lastTs); - t2
Let's say during the time between t1 and t2, there were updates to collectionName whereby say an item was added to a subdocument array, or some other update that is stateful.
document1.items.append(item1);
The questions is:
Will/can that new data show up in the cursor iteration?
What would happen when I replay the oplog - can I have duplicate items in the array.
The documentation suggests that other operations can be "interleaved" with the query which would suggest that the answer to both of these items is yes.
If so, is there a way to reliably replicate a single collection from mongo? It doesn't seem to me that something like Mongo Connector does it with strong data integrity.
Upvotes: 0
Views: 198