Yaron Schwimmer
Yaron Schwimmer

Reputation: 5357

MongoDB aggregate - match using lte and fallback to gte

I have a MongoDB collection with a structure (simplified) like this:

[
  {
    "name" : "name1",
    "instances" : [ 
      {
        "value" : 1,
        "score" : 2,
        "date" : ISODate("2015-03-04T00:00:00.000Z")
      }, 
      {
        "value" : 2,
        "score" : 5,
        "date" : ISODate("2015-04-01T00:00:00.000Z")
      }, 
      {
        "value" : 2.5,
        "score" : 9,
        "date" : ISODate("2015-03-05T00:00:00.000Z")
      }
    ]
  },
  {
    "name" : "name2",
    "instances" : [ 
      {
        "value" : 6,
        "score" : 3,
        "date" : ISODate("2015-03-05T00:00:00.000Z")
      }, 
      {
        "value" : 1,
        "score" : 6,
        "date" : ISODate("2015-03-04T00:00:00.000Z")
      }, 
      {
        "value" : 3.7,
        "score" : 5.2,
        "date" : ISODate("2015-02-04T00:00:00.000Z")
      }
    ]
  },
  {
    "name" : "name3",
    "instances" : [ 
      {
        "value" : 6,
        "score" : 3,
        "date" : ISODate("2015-03-05T00:00:00.000Z")
      }, 
      {
        "value" : 1,
        "score" : 6,
        "date" : ISODate("2015-03-04T00:00:00.000Z")
      }, 
      {
        "value" : 3.7,
        "score" : 5.2,
        "date" : ISODate("2015-02-04T00:00:00.000Z")
      }
    ]
  }
]

Currently I have an aggregate query that pulls a single instance from each document by a given date:

db.myCollection.aggregate([
  {$unwind: "$instances"},      
  {$sort: {'instances.date': -1}},
  {$match: {'instances.date': {$lte: <givenDate>}}},
  {$project: {name: 1, _id: 0, date: "$instances.date", value: "$instances.value", score: "$instances.score"}},
  {$group: {_id: "$name", name: {$first: "$name"}, date: {$first: "$date"},
    value: {$first: "$value"}, score: {$first: "$score"}}}
])

This query works just fine, and for a given date will return the latest (that is, exactly or before the given date) instance from every document.

My problem begins when the given date is prior to the earliest instance. For example, if my given date is 2015-03-02, I will not get any instance from name1. In that case, I want to retrieve the earliest instance available in the document.

Obviously, I can split this task into two different queries and merge the results, but I would like to achieve this goal in a single DB query if possible.

Any ideas?

Upvotes: 3

Views: 6669

Answers (1)

bagrat
bagrat

Reputation: 7428

The Pipeline

Try this pipeline, and then let's go step-by-step:

[
   {$unwind: "$instances"},
   {$project: {
                 _id: 0, 
                 name: 1, 
                 date: '$instances.date', 
                 matches: {
                             $cond: [
                                       {$lte: ['$instances.date', new Date(<YOUR DATE>)]}, 
                                       1, 
                                       0
                             ]
                 }, 
                 score: '$instances.score', 
                 value: '$instances.value'
              }
   }, 
   {$group: {
                 _id: '$name', 
                 instances: {
                               $push: {
                                         date: '$date', 
                                         score: '$score', 
                                         value: '$value', 
                                         matches: '$matches'
                               }
                 }, 
                 hasMatches: {$sum: '$matches'}
            }
   }, 
   {$unwind: "$instances"}, 
   {$project: {
                 _id: 0, 
                 name: '$_id', 
                 date: '$instances.date', 
                 hasMatches: '$hasMatches', 
                 matches: '$instances.matches', 
                 score: '$instances.score', 
                 value: '$instances.value'
              }
   }, 
   {$sort: {'name': 1, 'matches': -1, 'date': -1}}, 
   {$group: {
                 _id: {name: '$name', hasMatches: '$hasMatches'}, 
                 last_date: {$last: '$date'}, 
                 last_score: {$last: '$score'}, 
                 last_value: {$last: '$value'}, 
                 first_date: {$first: '$date'}, 
                 first_score: {$first: '$score'}, 
                 first_value: {$first: '$value'}}
   }, 
   {$project: {
                 name: '$_id.name', 
                 date: {$cond: ['$_id.hasMatches', '$first_date', '$last_date']}, 
                 score: {$cond: ['$_id.hasMatches', '$first_score', '$last_score']}, 
                 value: {$cond: ['$_id.hasMatches', '$first_value', '$last_value']}, 
                 _id: 0}
   }
]

The Explanation

The first $unwind and $project stages are simple and obvious, I only added a matches field which indicates whether the unwounded document matches your criteria.

Then we $group back the documents, and at the same time $sum up the matches field into the new hasMatches. The resulting documents now contain hasMatches field which indicates whether the instances array contains at least one element that matches your criteria.

Then again, we $unwind and $project, and afterwards $group again, keeping the hasMatches field and storing both $first and $last values for date, value and score for further processing.

Now the situation is as follows:

  • If there was at least one element in the initial array, that matched the criteria, in the result of sorting it has appeared as the first document in its group.

  • If there was no element in the initial array that matched the criteria, then in the result of sorting, the element with the earliest date appeared as the last document in its group.

So, as we have the hasMatches field which indicates the above condition, as well as both first_X and last_X values, we can easily pick one of those, depending on the hasMatches value. Thus the last $project stage, which does exactly that.

The Results

Here are the results for the dates you have mentioned in the comments:

'2015-03-04':

{ "name" : "name3", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 6, "value" : 1 }
{ "name" : "name2", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 6, "value" : 1 }
{ "name" : "name1", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 2, "value" : 1 }

'2015-03-02':

{ "name" : "name3", "date" : ISODate("2015-02-04T00:00:00Z"), "score" : 5.2, "value" : 3.7 }
{ "name" : "name2", "date" : ISODate("2015-02-04T00:00:00Z"), "score" : 5.2, "value" : 3.7 }
{ "name" : "name1", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 2, "value" : 1 }

Upvotes: 3

Related Questions