RedTailedHawk
RedTailedHawk

Reputation: 179

Mongo query on last sorted array item

Would like to query the following to obtain all item documents such that the last sale (ordered by soldDate) has a status of 2.

db.items.insertMany([
    { item: 1,
      sales: [
          { soldDate: ISODate("2021-10-04"), status: 1 },
          { soldDate: ISODate("2021-10-05"), status: 2 }
      ]
    },
    { item: 2,
      sales: [
          { soldDate: ISODate("2021-09-29"), status: 3 },
          { soldDate: ISODate("2021-09-24"), status: 1 }
      ]
    },
    { item: 3,
      sales: [
          { soldDate: ISODate("2021-06-01"), status: 3 },
          { soldDate: ISODate("2021-06-12"), status: 2 },
          { soldDate: ISODate("2021-06-07"), status: 1 }
      ]
    }
]);

So in this example, the query would return the following two documents:

    { item: 1,
      sales: [
          { soldDate: ISODate("2021-10-04"), status: 1 },
          { soldDate: ISODate("2021-10-05"), status: 2 }   // triggered by this
      ]
    },
    { item: 3,
      sales: [
          { soldDate: ISODate("2021-06-01"), status: 3 },
          { soldDate: ISODate("2021-06-12"), status: 2 },  // triggered by this
          { soldDate: ISODate("2021-06-07"), status: 1 }
      ]
    }

Thanks for any help.

Upvotes: 0

Views: 78

Answers (1)

barrypicker
barrypicker

Reputation: 10088

You stated: ordered by soldDate which can actually mean two things. Perhaps you want the documents sorted by the array, or perhaps you mean the array is sorted. I assumed the later.

Solution (Array sorted)

db.items.aggregate([
    { $match: { "sales.status": 2} },
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": 1} },
    { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } }
])

Results

Enterprise replSet [primary] barrydb> db.items.aggregate([
...     { $match: { "sales.status": 2} },
...     { $unwind: "$sales" },
...     { $sort: { "item": 1, "sales.soldDate": 1} },
...     { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } }
... ])
[
  {
    _id: ObjectId("617064519be05d9f1cbab346"),
    item: 1,
    sales: [
      { soldDate: ISODate("2021-10-04T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-10-05T00:00:00.000Z"), status: 2 }
    ]
  },
  {
    _id: ObjectId("617064519be05d9f1cbab348"),
    item: 3,
    sales: [
      { soldDate: ISODate("2021-06-01T00:00:00.000Z"), status: 3 },
      { soldDate: ISODate("2021-06-07T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-06-12T00:00:00.000Z"), status: 2 }
    ]
  }
]

But, to be complete here is a solution if you want the documents sorted (and the array not necessarily sorted).

Solution (Documents sorted)

db.items.aggregate([
    { $match: { "sales.status": 2} },
    { $sort: { "sales.soldDate": 1} }
])

Results

Enterprise replSet [primary] barrydb> db.items.aggregate([
...     { $match: { "sales.status": 2} },
...     { $sort: { "sales.soldDate": 1} }
... ])
[
  {
    _id: ObjectId("617064519be05d9f1cbab348"),
    item: 3,
    sales: [
      { soldDate: ISODate("2021-06-01T00:00:00.000Z"), status: 3 },
      { soldDate: ISODate("2021-06-12T00:00:00.000Z"), status: 2 },
      { soldDate: ISODate("2021-06-07T00:00:00.000Z"), status: 1 }
    ]
  },
  {
    _id: ObjectId("617064519be05d9f1cbab346"),
    item: 1,
    sales: [
      { soldDate: ISODate("2021-10-04T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-10-05T00:00:00.000Z"), status: 2 }
    ]
  }
]

EDIT - After re-reading I believe you want only where the record having a status of 2 is also has the greatest date in the array

Solution (Only last having status of value 2 - docs and array unsorted)

db.items.aggregate([
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": -1} },
    { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } },
    { $match : { "sales.0.status" : 2 } }
])

Results

Enterprise replSet [primary] barrydb> db.items.aggregate([
...     { $unwind: "$sales" },
...     { $sort: { "item": 1, "sales.soldDate": -1} },
...     { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } },
...     { $match : { "sales.0.status" : 2 } }
... ])
[
  {
    _id: ObjectId("617064519be05d9f1cbab346"),
    item: 1,
    sales: [
      { soldDate: ISODate("2021-10-05T00:00:00.000Z"), status: 2 },
      { soldDate: ISODate("2021-10-04T00:00:00.000Z"), status: 1 }
    ]
  },
  {
    _id: ObjectId("617064519be05d9f1cbab348"),
    item: 3,
    sales: [
      { soldDate: ISODate("2021-06-12T00:00:00.000Z"), status: 2 },
      { soldDate: ISODate("2021-06-07T00:00:00.000Z"), status: 1 },
      { soldDate: ISODate("2021-06-01T00:00:00.000Z"), status: 3 }
    ]
  }
]

EDIT - Add Self Referencing Lookup

db.items.aggregate([
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": -1} },
    { $group: { "_id": "$_id", "item": { $first: "$item" }, "sales": { $push: "$sales" } } },
    { $match : { "sales.0.status" : 2 } },
    { $lookup : {
            from: "items",
            localField: "_id",
            foreignField: "_id",
            as: "results"
        }
    },
    { $unwind: "$results" },
    { $replaceRoot: { "newRoot": "$results" } }
])

With the self-referencing lookup we are treating MongoDB as a relational database. We find the documents that meet our requirements, but in doing so we have destroyed the original shape and content. By performing a lookup on the same records we can restore the shape but at a performance penalty.

Retain Copy

Rather than performing a lookup, which has a performance concern, a different approach is to leverage memory on the server. Keep a copy of the original while moving through the pipeline and manipulating the original to identify desired records...

db.items.aggregate([
    { $addFields: { "_original": "$$ROOT" } },
    { $unwind: "$sales" },
    { $sort: { "item": 1, "sales.soldDate": -1} },
    { $group: { "_id": "$_id", "_original": { $first: "$_original" }, "sales_status": { $push: "$sales.status" } } },
    { $match : { "sales_status.0" : 2 } },
    { $replaceRoot: { "newRoot": "$_original" } }
])

In this example we keep a copy of the original in the field _original then once we have identified the records we want we pivot the root back to _original. This may put pressure on the WiredTiger cache as we are keeping a duplicate of all selected records in memory during the execution of the pipeline. A $lookup approach also has this memory concern. Two queries would eliminate the cache pressure issues, but behaves like a $lookup and would not perform as well.

Upvotes: 1

Related Questions