Reputation: 25
My inventory collection is as follows:
{ "_id": 1, "dept": "A", "item": { "sku": "111", "color": "red" }, "sizes": [ "S", "M" ] }
{ "_id": 2, "dept": "A", "item": { "sku": "111", "color": "blue" }, "sizes": [ "M", "L" ] }
{ "_id": 3, "dept": "B", "item": { "sku": "222", "color": "blue" }, "sizes": "S" }
{ "_id": 4, "dept": "A", "item": { "sku": "333", "color": "black" }, "sizes": [ "S" ] }
Mongo documentation specifies distinct command as follows:
Finds the distinct values for a specified field across a single collection. distinct returns a document that contains an array of the distinct values.
On executing distinct on dept field and SKU fields returns as follows:
db.inventory.distinct( "dept" )
outputs: [ "A", "B" ]
db.inventory.distinct( "item.sku" )
ouputs: [ "111", "222", "333" ]
This clearly indicates distinct command ordering, which by default depends on the order by which documents are inserted into the collection.
But the distinct on array fields like sizes ideally (db.inventory.distinct( "sizes" )
) should return ["S", "M", "L"]
but in turn returns [ "M", "S", "L" ]
This behaviour is with default indexing added only on the _id field and no other field indexing.
Any insights into mongo's distinct implementation are helpful. Thanks in advance.
PS: My requirement is a collection A contains user metadata(userId, password, createdDate), I use distinct on A(userId field) and create documents in another collection B to keep daily snapshots of user-related data like orders. Typical B document would be like: { date: datetime, userData: [{user0's orders}, {user1's orders} ... ] }
At end of the month, I need aggregation of users0's order data. User metadata can keep adding new users and all users almost have daily order activity. So default ordering of mongo distinct matters here.
Upvotes: 0
Views: 331
Reputation: 2194
If you want to keep a previous ordering while applying distinct, you need to go with the approach of mongo.aggregate
and you need to apply the $sort
before and after applying $group
.
So, for example, if you want to keep the sorting criteria of updated_at
column in a descending order while applying the distinct of the name
column in your collection, you should do as following:
db.collection.aggregate([
{ $sort: { updated_at: -1 } }, // Sort documents by updated_at Descending
{
$group: {
_id: "$name", // Group Documents by name
updated_at: { $first: "$updated_at" } // Capture most recent updated_at
}
},
{ $sort: { updated_at: -1 } }, // Sort groups by updated_at descending (need to be sorted again because the $group removes the sort on its output)
{ $project: { name: "$_id", _id: 0 } } // Project the desired output by filling "name" with the result of "_id" (created from the $group clause) and removes the "_id" afterwards
])
[
{ "name": "apocalypse1212" },
{ "name": "victorcorcos" },
{ "name": "Belly Buster" }
]
Upvotes: 0
Reputation: 8834
The order results are returned is not guaranteed, so you should assume it is arbitrary.
If you need the results in a specific order, just add a .sort()
to your query.
Upvotes: 2