Reputation: 7106
I have an application that continuously inserts documents to a MongoDB collection.
I'm looking for a way to query documents following their insertion order.
The candidates I wanted to use:
_id
fieldThe _id
field is not a good candidate as the docs say. A creation date field could have been a good candidate, however the fact that clocks may not be in sync may break the order. Regarding sequence numbers, the docs propose two approaches: counters and optimistic loop. The counters approach doesn't guarantee the insertion order because a document D1
may be inserted after another document D2
even if D1.seq < D2.seq
. For example, if D1
seizes sequence number 5, then D2
seizes sequence number 6, then D2
is inserted, then D1
is inserted. The optimistic loop approach is crazy in case of heavy insert environment.
Is there another approach?
EDIT:
The approach using counters is problematic. Consider the following scenario. I have an application A
that continuously inserts documents to a collection. I also have another application B
that continuously polls for documents from the same collection. Application A
is multi threaded. Two threads T1
and T2
are about to insert documents D1
and D2
, respectively. In the middle of insertions, application B
asks for more documents. Assume the following ordering of operations:
A-T1
seizes next sequence number N
A-T2
seizes next sequence number N+1
A-T2
inserts D2
B
asks for documents with seq >= N
(assume last document processed has seq number N-1
) and receives D2
(D1
has not been inserted yet)A-T1
inserts D1
B
asks for documents with seq >= N+2
(since the last processed document has seq number N+1
)In this case, D1
will never be processed.
Upvotes: 1
Views: 2353
Reputation: 37038
If you expect tens of inserts per second, optimistic lock is the only way.
Otherwise clock sync could be a better idea.
Considering counters, could you elaborate how it affect your application, if D1
is persisted after D2
, since you guarantee order of acquiring the sequential number? The "insert" operation in mongodb itself have multiple stages, and you can go as deep as rely on journaling.
EDIT
Would you consider tailable cursor as an option for application B
? It does not answer the question directly, but it may solve the problem behind the question.
EDIT 2
Then you probably need to use a message queue of any kind to communicate between apps, like on the image. It may be an overkill, but if you are sure that optimistic lock is a bottleneck, then it may be acceptable.
On the image below:
applications A
insert a document in any order and retrieve unique object id from mongo client.
applications A
send objectID to the queue in any order
applications B
get next objectID from the queue
applications B
fetch document by ID from the database
EDIT 3
Finally, you may consider to add status to the document and shift optimistic lock to application B
:
retrieve and objectID of unprocessed document: db.collection.findOne({status: null}, {})
change it's status to 'processing'
db.collection.findAndModify({
query: { _id: objectId, status: null },
update: { $set: { status: 'processing' }}
})
if it returns null - the document is being processed by another instance of B
, so return to step 1
process the document and update it's status to 'done':
db.collection.findAndModify({
query: { _id: objectId, status: 'processing' },
update: { $set: { status: 'done' }}
})
With this approach you don't care about exact sequence at all. You can add a timestamp, or relay on ObjectId to sort documents in step 1, if you like to process documents in order. It may not be exact order of course, but you don't need it to guarantee all documents are processed.
Upvotes: 0