Reputation: 5357
I have a MongoDB collection with a structure (simplified) like this:
[
{
"name" : "name1",
"instances" : [
{
"value" : 1,
"score" : 2,
"date" : ISODate("2015-03-04T00:00:00.000Z")
},
{
"value" : 2,
"score" : 5,
"date" : ISODate("2015-04-01T00:00:00.000Z")
},
{
"value" : 2.5,
"score" : 9,
"date" : ISODate("2015-03-05T00:00:00.000Z")
}
]
},
{
"name" : "name2",
"instances" : [
{
"value" : 6,
"score" : 3,
"date" : ISODate("2015-03-05T00:00:00.000Z")
},
{
"value" : 1,
"score" : 6,
"date" : ISODate("2015-03-04T00:00:00.000Z")
},
{
"value" : 3.7,
"score" : 5.2,
"date" : ISODate("2015-02-04T00:00:00.000Z")
}
]
},
{
"name" : "name3",
"instances" : [
{
"value" : 6,
"score" : 3,
"date" : ISODate("2015-03-05T00:00:00.000Z")
},
{
"value" : 1,
"score" : 6,
"date" : ISODate("2015-03-04T00:00:00.000Z")
},
{
"value" : 3.7,
"score" : 5.2,
"date" : ISODate("2015-02-04T00:00:00.000Z")
}
]
}
]
Currently I have an aggregate
query that pulls a single instance from each document by a given date:
db.myCollection.aggregate([
{$unwind: "$instances"},
{$sort: {'instances.date': -1}},
{$match: {'instances.date': {$lte: <givenDate>}}},
{$project: {name: 1, _id: 0, date: "$instances.date", value: "$instances.value", score: "$instances.score"}},
{$group: {_id: "$name", name: {$first: "$name"}, date: {$first: "$date"},
value: {$first: "$value"}, score: {$first: "$score"}}}
])
This query works just fine, and for a given date will return the latest (that is, exactly or before the given date) instance from every document.
My problem begins when the given date is prior to the earliest instance. For example, if my given date is 2015-03-02, I will not get any instance from name1
. In that case, I want to retrieve the earliest instance available in the document.
Obviously, I can split this task into two different queries and merge the results, but I would like to achieve this goal in a single DB query if possible.
Any ideas?
Upvotes: 3
Views: 6669
Reputation: 7428
The Pipeline
Try this pipeline, and then let's go step-by-step:
[
{$unwind: "$instances"},
{$project: {
_id: 0,
name: 1,
date: '$instances.date',
matches: {
$cond: [
{$lte: ['$instances.date', new Date(<YOUR DATE>)]},
1,
0
]
},
score: '$instances.score',
value: '$instances.value'
}
},
{$group: {
_id: '$name',
instances: {
$push: {
date: '$date',
score: '$score',
value: '$value',
matches: '$matches'
}
},
hasMatches: {$sum: '$matches'}
}
},
{$unwind: "$instances"},
{$project: {
_id: 0,
name: '$_id',
date: '$instances.date',
hasMatches: '$hasMatches',
matches: '$instances.matches',
score: '$instances.score',
value: '$instances.value'
}
},
{$sort: {'name': 1, 'matches': -1, 'date': -1}},
{$group: {
_id: {name: '$name', hasMatches: '$hasMatches'},
last_date: {$last: '$date'},
last_score: {$last: '$score'},
last_value: {$last: '$value'},
first_date: {$first: '$date'},
first_score: {$first: '$score'},
first_value: {$first: '$value'}}
},
{$project: {
name: '$_id.name',
date: {$cond: ['$_id.hasMatches', '$first_date', '$last_date']},
score: {$cond: ['$_id.hasMatches', '$first_score', '$last_score']},
value: {$cond: ['$_id.hasMatches', '$first_value', '$last_value']},
_id: 0}
}
]
The Explanation
The first $unwind
and $project
stages are simple and obvious, I only added a matches
field which indicates whether the unwounded document matches your criteria.
Then we $group
back the documents, and at the same time $sum
up the matches
field into the new hasMatches
. The resulting documents now contain hasMatches
field which indicates whether the instances
array contains at least one element that matches your criteria.
Then again, we $unwind
and $project
, and afterwards $group
again, keeping the hasMatches
field and storing both $first
and $last
values for date
, value
and score
for further processing.
Now the situation is as follows:
If there was at least one element in the initial array, that matched the criteria, in the result of sorting it has appeared as the first document in its group.
If there was no element in the initial array that matched the criteria, then in the result of sorting, the element with the earliest date appeared as the last document in its group.
So, as we have the hasMatches
field which indicates the above condition, as well as both first_X
and last_X
values, we can easily pick one of those, depending on the hasMatches
value. Thus the last $project
stage, which does exactly that.
The Results
Here are the results for the dates you have mentioned in the comments:
'2015-03-04':
{ "name" : "name3", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 6, "value" : 1 }
{ "name" : "name2", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 6, "value" : 1 }
{ "name" : "name1", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 2, "value" : 1 }
'2015-03-02':
{ "name" : "name3", "date" : ISODate("2015-02-04T00:00:00Z"), "score" : 5.2, "value" : 3.7 }
{ "name" : "name2", "date" : ISODate("2015-02-04T00:00:00Z"), "score" : 5.2, "value" : 3.7 }
{ "name" : "name1", "date" : ISODate("2015-03-04T00:00:00Z"), "score" : 2, "value" : 1 }
Upvotes: 3