Reputation: 183
I am trying to formulate a query over the sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:
Retrieve all persons and their awards that they received before receiving a Turing award.
I have come up with this query:
db.bios.aggregate([
{$match: {"awards.award" : "Turing Award"}},
{$project: {"award1": "$awards", "award2": "$awards", "first_name": "$name.first", "last_name": "$name.last"}},
{$unwind: "$award1"},
{$match: {"award1.award" : "Turing Award"}},
{$unwind: "$award2"},
{$redact: {
$cond: {
if: { $eq: [ { $gt: [ "$award1.year", "$award2.year"] }, true]},
then: "$$KEEP",
else: "$$PRUNE"
}
}
}
])
And this is the answer:
/* 0 */
{
"result" : [
{
"_id" : 1,
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : {
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
"first_name" : "John",
"last_name" : "Backus"
},
{
"_id" : 1,
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : {
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
"first_name" : "John",
"last_name" : "Backus"
},
{
"_id" : 4,
"award1" : {
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
"award2" : {
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
"first_name" : "Kristen",
"last_name" : "Nygaard"
},
{
"_id" : 5,
"award1" : {
"award" : "Turing Award",
"year" : 2001,
"by" : "ACM"
},
"award2" : {
"award" : "Rosing Prize",
"year" : 1999,
"by" : "Norwegian Data Association"
},
"first_name" : "Ole-Johan",
"last_name" : "Dahl"
}
],
"ok" : 1
}
What I don't like about this solution is that I unwind $award2
. Instead, I would be happy to keep award2 as an array, and only remove those awards that were received after award1. So, for instance, the answer for John Backus should be:
{
"_id" : 1,
"first_name" : "John",
"last_name" : "Backus",
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
}
]
}
Is it possible to achieve it with $redact
without doing $unwind: "$award2"
?
Upvotes: 8
Views: 6775
Reputation: 1295
You can use a single project stage with nested expressions to accomplish this by avoiding multiple stages:
db.bios.aggregate([
{$match : {"awards.award" : "Turing Award"}},
{$project : {
award1 : { $arrayElemAt : [{
$filter : {
input : "$awards",
as : "award",
cond : {$eq : ["$$award.award","Turing Award"]}
}}, 0]},
award2 : { $let : {
vars : {
turing_year : { $let : {
vars : {
turingAward :{"$arrayElemAt" : [{"$filter" : {
input : "$awards",
as : "award",
cond : {$eq : ["$$award.award","Turing Award"]}
}}, 0]}},
in : "$$turingAward.year"}}},
in : {
$filter : {
input : "$awards",
as : "award",
cond : {$lt : ["$$award.year", "$$turing_year"]}
}
}
}},
first_name : "$name.first",
last_name : "$name.last"}
}]).pretty();
Please check the documentation here for a set of useful array operators.
However, aggregation doesn't look pretty for this query and the logic is simple enough to be implemented in the code itself without much performance impacts; just agreeing with Blakes Seven's answer. But, one of the neat things about MongoDB is we can design the schema to support our access patterns and keep our code clean. If such a functionality is needed in a real scenario, we can simply include a field called "turing_award_year" in the document. This will impact the CRUD operations on the collection, but the code will be clean, and now we can use a query like this which is pretty and easier to maintain:
db.bios.aggregate(
[
{$match : {"awards.award" : "Turing Award"}},
{$project : {
award1 : { $arrayElemAt : [{
$filter : {
input : "$awards",
as : "award",
cond : {$eq : ["$$award.award","Turing Award"]}
}}, 0]
},
award2 : { $filter : {
input : "$awards",
as : "award",
cond : {$lt : ["$$award.year", "$turing_award_year"]}
}
}
,
first_name : "$name.first",
last_name : "$name.last"
}}
]
).pretty();
Upvotes: 0
Reputation: 50406
It might have been a little more helpful if you had included the original state of the document as an example in your question as this clearly shows "where you are coming from" and then to "where you want to get to" as a goal in addition to your desired output as given.
That's just a tip, but it seems that you are starting with a document like this:
{
"_id" : 1,
"name": {
"first" : "John",
"last" : "Backus"
},
"awards" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
},
{
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
{
"award" : "Some other award",
"year" : 1979,
"by" : "Someone Else"
}
]
}
So the real points here is that while you may haved reached for $redact
here (and it is a bit nicer than using $project
for a logical condition and then using $match
to filter that logical match ) this probably isn't the best tool for the comparison you want to do here.
Before moving on I just want to point out the main problem here with $redact
. Whatever you do here the logic ( without an unwind ) would be essentially to compare "directly" on $$DESCEND
in order to process the the array elements on the value of "year" at whatever level.
That recursion is going to invalidate the "award1" condition as well since it has the same field name. Even renaming that field kills the logic since a projected value where it was missing would not be greater than the tested value.
In a nutshell, $redact
is ruled right out since you cannot say "take from here only" with the logic it applies.
The alternate is to use $map
and $setDifference
to filter contents from the arrays as follows:
db.bios.aggregate([
{ "$match": { "awards.award": "Turing Award" } },
{ "$project": {
"first_name": "$name.first",
"last_name": "$name.last",
"award1": { "$setDifference": [
{ "$map": {
"input": "$awards",
"as": "a",
"in": { "$cond": [
{ "$eq": [ "$$a.award", "Turing Award" ] },
"$$a",
false
]}
}},
[false]
]},
"award2": { "$setDifference": [
{ "$map": {
"input": "$awards",
"as": "a",
"in": { "$cond": [
{ "$ne": [ "$$a.award", "Turing Award" ] },
"$$a",
false
]}
}},
[false]
]}
}},
{ "$unwind": "$award1" },
{ "$project": {
"first_name": 1,
"last_name": 1,
"award1": 1,
"award2": { "$setDifference": [
{ "$map": {
"input": "$award2",
"as": "a",
"in": { "$cond": [
{ "$gt": [ "$award1.year", "$$a.year" ] },
"$$a",
false
]}
}},
[false]
]}
}}
])
And there really is no "pretty" way of getting around either the usage of $unwind
in the itermediatary stage or even the second $project
here, since $map
( and the $setDifference
filter ) returns what is "still an array". So the $unwind
is necessary to make the "array" a singular ( provided your condition only matches 1 element ) entry for which to use in comparison.
Trying to "squish" all the logic in a single $project
will only result in "arrays of arrays" in the second output, and still some "unwinding" therefore required, but at least this way unwinding the (hopefully) 1 match is not really that costly and keeps the output clean.
But the other thing to really note here is that you are not really "aggregating" anything here at all. This is just document manipulation, so you might well consider to just do that manipulation directly in client code. As demonstrated with this shell example:
db.bios.find(
{ "awards.award": "Turing Award" },
{ "name": 1, "awards": 1 }
).forEach(function(doc) {
doc.first_name = doc.name.first;
doc.last_name = doc.name.last;
doc.award1 = doc.awards.filter(function(award) {
return award.award == "Turing Award"
})[0];
doc.award2 = doc.awards.filter(function(award) {
return doc.award1.year > award.year;
});
delete doc.name;
delete doc.awards;
printjson(doc);
})
At any rate, both approaches will output the same:
{
"_id" : 1,
"first_name" : "John",
"last_name" : "Backus",
"award1" : {
"award" : "Turing Award",
"year" : 1977,
"by" : "ACM"
},
"award2" : [
{
"award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society"
},
{
"award" : "National Medal of Science",
"year" : 1975,
"by" : "National Science Foundation"
}
]
}
The only real difference here is that by using .aggregate()
the content of "award2" will already be filtered when returned from the server, which probably isn't going to be that much different from doing the client processing approach unless the items that would be removed comprises a reasonably large list per document.
For the record, the only alteration to your existing aggregation pipeline really required here would be to add a $group
to the end to "re-combine" the array entries into a single document:
db.bios.aggregate([
{ "$match": { "awards.award": "Turing Award" } },
{ "$project": {
"first_name": "$name.first",
"last_name": "$name.last",
"award1": "$awards",
"award2": "$awards"
}},
{ "$unwind": "$award1" },
{ "$match": {"award1.award" : "Turing Award" }},
{ "$unwind": "$award2" },
{ "$redact": {
"$cond": {
"if": { "$gt": [ "$award1.year", "$award2.year"] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$group": {
"_id": "$_id",
"first_name": { "$first": "$first_name" },
"last_name": { "$first": "$last_name" },
"award1": { "$first": "$award1" },
"award2": { "$push": "$award2" }
}}
])
But then again, there is all that "array duplication" and the "cost of unwind" associated with all the operations here. So either of the first two approaches is what you really want in order to avoid that.
Upvotes: 5