apocalypse1212
apocalypse1212

Reputation: 25

What is the default ordering of Mongo distinct method?

My inventory collection is as follows:

{ "_id": 1, "dept": "A", "item": { "sku": "111", "color": "red" }, "sizes": [ "S", "M" ] }
{ "_id": 2, "dept": "A", "item": { "sku": "111", "color": "blue" }, "sizes": [ "M", "L" ] }
{ "_id": 3, "dept": "B", "item": { "sku": "222", "color": "blue" }, "sizes": "S" }
{ "_id": 4, "dept": "A", "item": { "sku": "333", "color": "black" }, "sizes": [ "S" ] }

Mongo documentation specifies distinct command as follows:

Finds the distinct values for a specified field across a single collection. distinct returns a document that contains an array of the distinct values.

On executing distinct on dept field and SKU fields returns as follows:

db.inventory.distinct( "dept" )

outputs: [ "A", "B" ]

db.inventory.distinct( "item.sku" )

ouputs: [ "111", "222", "333" ]

This clearly indicates distinct command ordering, which by default depends on the order by which documents are inserted into the collection.

But the distinct on array fields like sizes ideally (db.inventory.distinct( "sizes" )) should return ["S", "M", "L"] but in turn returns [ "M", "S", "L" ]

This behaviour is with default indexing added only on the _id field and no other field indexing.

Any insights into mongo's distinct implementation are helpful. Thanks in advance.

PS: My requirement is a collection A contains user metadata(userId, password, createdDate), I use distinct on A(userId field) and create documents in another collection B to keep daily snapshots of user-related data like orders. Typical B document would be like: { date: datetime, userData: [{user0's orders}, {user1's orders} ... ] }

At end of the month, I need aggregation of users0's order data. User metadata can keep adding new users and all users almost have daily order activity. So default ordering of mongo distinct matters here.

Upvotes: 0

Views: 331

Answers (2)

Victor Cordeiro Costa
Victor Cordeiro Costa

Reputation: 2194

If you want to keep a previous ordering while applying distinct, you need to go with the approach of mongo.aggregate and you need to apply the $sort before and after applying $group.

So, for example, if you want to keep the sorting criteria of updated_at column in a descending order while applying the distinct of the name column in your collection, you should do as following:

db.collection.aggregate([
  { $sort: { updated_at: -1 } }, // Sort documents by updated_at Descending
  {
    $group: {
      _id: "$name", // Group Documents by name
      updated_at: { $first: "$updated_at" } // Capture most recent updated_at
    }
  },
  { $sort: { updated_at: -1 } }, // Sort groups by updated_at descending (need to be sorted again because the $group removes the sort on its output)
  { $project: { name: "$_id", _id: 0 } } // Project the desired output by filling "name" with the result of "_id" (created from the $group clause) and removes the "_id" afterwards
])
  • Output Example
[
  { "name": "apocalypse1212" },
  { "name": "victorcorcos" },
  { "name": "Belly Buster" }
]

Upvotes: 0

Belly Buster
Belly Buster

Reputation: 8834

The order results are returned is not guaranteed, so you should assume it is arbitrary.

If you need the results in a specific order, just add a .sort() to your query.

Upvotes: 2

Related Questions