Reputation: 2426
I have the following aggregate query which gives me counts (countA) for a given date range period. In this case 01/01/2016-03/31/2016. Is it possible to add a second date rage period for example 04/01/2016-07/31/2016 and count these as countB?
db.getCollection('customers').aggregate(
{$match: {"status": "Closed"}},
{$unwind: "$lines"},
{$match: {"lines.status": "Closed"}},
{$match: {"lines.deliveryMethod": "Tech Delivers"}},
{$match: {"date": {$gte: new Date('01/01/2016'), $lte: new Date('03/31/2016')}}},
{$group:{_id:"$lines.productLine",countA: {$sum: 1}}}
)
Thanks in advance
Upvotes: 3
Views: 1846
Reputation: 50406
Sure, and you can also simplify your pipeline stages quite a lot, mostly since successive $match
stages are really a single stage, and that you should always use match criteria at the beginning of any aggregation pipeline. Even if it doesn't actually "filter" the array content, it at least just selects the documents containing entries that will actually match. This speeds things up immensely, and especially with large data sets.
For the two date ranges, well this is just an $or
query argument. Also it would be applied "before" the array filtering is done, since after all it is a document level match to begin with. So again, in the very first pipeline $match
:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Unwind the array
{ "$unwind": "$lines" },
// Filter just the matching elements
// Successive $match is really just one pipeline stage
{ "$match": {
"lines.status": "Closed",
"lines.deliveryMethod": "Tech Delivers"
}},
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
The $or
basically "joins" two result sets as it looks for "either" range criteria to apply. As this is given in addition to the other arguments, the logic is an "AND" condition as with the others on the criteria met with either $or
argument. Note the $gte
and $lt
combination is also another form of expressing "AND" conditions on the same key.
The $elemMatch
is applied since "both" criteria are required on the array element. If you just directly applied them with "dot notation", then all that really asks is that "at least one array element" matches each condition, rather than the array element matching "both" conditions.
The later filtering after $unwind
can use the "dot notation" since the array elements are now "de-normalised" into separate documents. So there is only one element per document to now match the conditions.
When you apply the $group
, instead of just using { "$sum": 1 }
you rather "conditionally assess whether to count it or not by using $cond
. Since both date ranges are within the results, you just need to determine if the current document being "rolled up" belongs to one date range or another. As a "ternary" (if/then/else) operator, this is what $cond
provides.
It looks at the values within "date"
in the document and if it matches the condition set ( first argument - if ) then it returns 1
( second argument - then ), else it returns 0
, effectively not adding to the current count.
Since these are "logical" conditions then the "AND" is expressed with a logical $and
operator, which itself returns true
or false
, requiring both contained conditions to be true
.
Also note the correction in the Date
object constructors, since if you do not instantiate with the string in that representation then the resulting Date
is in "localtime" as opposed to the "UTC" format in which MongoDB is storing the dates. Only use a "local" constructor if you really mean that, and often people really don't.
The other note is the $lt
date change, which should always be "one day" greater than the last date you are looking for. Remember these are "beginning of day" dates, and therefore you usually want all possible times within the date, and not just up to the beginning. So it's "less than the next day" as the correct condition.
For the record, with MongoDB versions from 2.6, it's likely better to "pre-filter" the array content "before" you $unwind
. This removes the overhead of producing new documents in the "de-normalizing" that occurs that would not match the conditions you want to apply to array elements.
For MongoDB 3.2 and greater, use $filter
:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$project": {
"lines": {
"$filter": {
"input": "$lines",
"as": "line",
"cond": {
"$and": [
{ "$eq": [ "$$line.status", "Closed" ] },
{ "$eq": [ "$$line.deliveryMethod", "Tech Delivers" ] }
]
}
}
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Or for at least MongoDB 2.6, then apply $redact
instead:
db.getCollection('customers').aggregate([
// Filter all document conditions first. Reduces things to process.
{ "$match": {
"status": "Closed",
"lines": { "$elemMatch": {
"status": "Closed",
"deliveryMethod": "Tech Delivers"
}},
"$or": [
{ "date": {
"$gte": new Date("2016-01-01"),
"$lt": new Date("2016-04-01")
}},
{ "date": {
"$gte": new Date("2016-04-01"),
"$lt": new Date("2016-08-01")
}}
]
}},
// Pre-filter the array content to matching elements
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ "$status", "Closed" ] },
{ "$eq": [
{ "$ifNull": ["$deliveryMethod", "Tech Delivers" ] },
"Tech Delivers"
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// Unwind the array
{ "$unwind": "$lines" },
// Then group on the productline values within the array
{ "$group":{
"_id": "$lines.productLine",
"countA": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date": new Date("2016-01-01") ] },
{ "$lt": [ "$date", new Date("2016-04-01") ] }
]},
1,
0
]
}
},
"countB": {
"$sum": {
"$cond": [
{ "$and": [
{ "$gte": [ "$date", new Date("2016-04-01") ] },
{ "$lt": [ "$date", new Date("2016-08-01") ] }
]},
1,
0
]
}
}
}}
])
Noting that funny little $ifNull
in there which is necessary due to the recursive nature of $$DESCEND
, since all levels of the document are inspected, including the "top level" document and then "descending" into subsequent arrays and members or even nested objects. The "status" field is present and has a value of "Closed" due to earlier query selection criteria for the top level field, but of course there is no "top level" element called "deliveryMethod", since it is only within the array elements.
That basically is the "care" then needs to be take when using $redact
like this, and if the structure if the document does not allow such conditions, then it's not really an option, so revert to processing $unwind
then $match
instead.
But where possible, use those methods in preference to the $unwind
then $match
processing, as it will save considerable time and use less resources by using the newer techniques instead.
Upvotes: 5