Reputation: 2498
When we run a Mongo find() query without any sort order specified, what does the database internally use to sort the results?
According to the documentation on the mongo website:
When executing a find() with no parameters, the database returns objects in forward natural order.
For standard tables, natural order is not particularly useful because, although the order is often close to insertion order, it is not guaranteed to be. However, for Capped Collections, natural order is guaranteed to be the insertion order. This can be very useful.
However for standard collections (non capped collections), what field is used to sort the results? Is it the _id field or something else?
Edit:
Basically, I guess what I am trying to get at is that if I execute the following search query:
db.collection.find({"x":y}).skip(10000).limit(1000);
At two different points in time: t1 and t2, will I get different result sets:
I have run some tests on a temp database and the results I have gotten are the same (Yes) for all the 3 cases - but I wanted to be sure and I am certain that my test cases weren't very thorough.
Upvotes: 164
Views: 62164
Reputation: 65313
The default internal sort order (or natural order) is an undefined implementation detail. Maintaining order is extra overhead for storage engines and MongoDB's API does not mandate predictability outside of an explicit sort()
or the special cases of clustered collections and fixed-sized capped collections.
For typical workloads it is desirable for the storage engine to try to reuse available preallocated space and make decisions about how to most efficiently store data on disk and in memory. Without any query criteria, results will be returned by the storage engine in natural order (aka in the order they are found). Result order may coincide with insertion order but this behaviour is not guaranteed and cannot be relied on (aside from clustered or capped collections).
Some examples that may affect storage (natural) order:
WiredTiger uses a different representation of documents on disk versus the in-memory cache, so natural ordering may change based on internal data structures.
The original MMAPv1 storage engine (removed in MongoDB 4.2) allocates record space for documents based on padding rules. If a document outgrows the currently allocated record space, the document location (and natural ordering) will be affected. New documents can also be inserted in storage marked available for reuse due to deleted or moved documents.
Replication uses an idempotent oplog format to apply write operations consistently across replica set members. Each replica set member maintains local data files that can vary in natural order, but will have the same data outcome when oplog updates are applied.
If an index is used, documents will be returned in the order they are found (which does necessarily match insertion order or I/O order). If more than one index is used then the order depends internally on which index first identified the document during the de-duplication process.
If you want a predictable sort order you must include an explicit sort()
with your query and have unique values for your sort key.
The implementation exception noted for natural order in capped collections is enforced by their special usage restrictions: documents are stored in insertion order but existing document size cannot be increased and documents cannot be explicitly deleted. Ordering is part of the capped collection design that ensures the oldest documents "age out" first.
Starting in MongoDB 5.3, it is possible to create a clustered collection where documents are ordered by _id
index key values. The clusteredIndex
must be declared when the collection is created. Clustered collections have some usage limitations but can improve performance for queries like range scans and equality comparisons on the clustered index key.
Upvotes: 172
Reputation: 26012
It is returned in the stored order (order in the file), but it is not guaranteed to be that they are in the inserted order. They are not sorted by the _id field. Sometimes it can be look like it is sorted by the insertion order but it can change in another request. It is not reliable.
Upvotes: 17