Reputation: 2135
I have a mongoose Group schema which contains invitee (array of sub document) and currentMove, invitee also contains currentMove and I want to get document with only sub document that have same currentMove.
Group.findById("5a03fa29fafa645c8a399353")
.populate({
path: 'invitee.user_id',
select: 'currentMove',
model:"User",
match: {
"currentMove":{
$eq: "$currentMove"
}
}
})
This generates unknown currentMove Object id for match query. I'm not sure if mongoose has this functionality. Can anyone help me, please?
Upvotes: 4
Views: 2533
Reputation: 151122
In modern MongoDB releases it is far more efficient to use $lookup
here instead of .populate()
. Also the basic concept that you want to filter based on a comparison of fields is something that MongoDB does quite well with native operators, but it's not something you can easily transpose into .populate()
.
In fact the only way possible to actually use with .populate()
would be to first retrieve all results, and then use Model.populate()
with a $where
clause on query all whilst processing the result array with Array.map()
in order to apply the local value of each document to the conditions to "join" on.
It's all kind of messy, and involves pulling all results from the server and filtering locally. So $lookup
is our best option here, where all of the "filtering" and "matching" actually takes place on the server without needing to pull unnecessary documents over the network just to obtain a result.
You don't actually include a "schema" in your question, so we can only work with an approximation based on what parts you actually do include in the question. So my example here uses:
const userSchema = new Schema({
name: String,
currentMove: Number
})
const groupSchema = new Schema({
name: String,
topic: String,
currentMove: Number,
invitee: [{
user_id: { type: Schema.Types.ObjectId, ref: 'User' },
confirmed: { type: Boolean, default: false }
}]
});
From here we have different approaches to the $lookup
queries. The first basically involves applying $unwind
both before and after the $lookup
stage. This is partly since your "reference" is an embedded field within the array, and also partly because it's actually the most efficient query form to use here with a possible "join" result that could potentially exceed the BSON limit ( 16MB for the document ) being avoided:
Group.aggregate([
{ "$unwind": "$invitee" },
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "invitee.user_id"
}},
{ "$unwind": "$invitee.user_id" },
{ "$redact": {
"$cond": {
"if": { "$eq": ["$currentMove", "$invitee.user_id.currentMove"] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"topic": { "$first": "$topic" },
"currentMove": { "$first": "$currentMove" },
"invitee": { "$push": "$invitee" }
}}
]);
The key expression here is the $redact
which is processed after the $lookup
result is returned. This allows a logical comparison of the "currentMove"
values from both the parent document and the "joined" detail for the User
objects.
Since we $unwind
the array content, we use $group
with $push
to reconstruct the array ( if you must ) and select the other fields of the original document using $first
.
There are ways to examine the schema and generate such a stage, but that's not really in the scope of the question. An example can be seen on Querying after populate in Mongoose. Point being that if you want the fields returned, then you would construct this pipeline stage around using those expressions to return a document of the original shape.
An alternate approach where you are certain that the "unfiltered" result of the "join" will not cause the document to exceed the BSON limit is to instead make a separate target array, and then reconstruct your "joined" array content using $map
and $filter
, and other array operators:
Group.aggregate([
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "inviteeT"
}},
{ "$addFields": {
"invitee": {
"$map": {
"input": {
"$filter": {
"input": "$inviteeT",
"as": "i",
"cond": { "$eq": ["$$i.currentMove","$currentMove"] }
}
},
"as": "i",
"in": {
"_id": {
"$arrayElemAt": [
"$invitee._id",
{ "$indexOfArray": ["$invitee.user_id", "$$i._id"] }
]
},
"user_id": "$$i",
"confirmed": {
"$arrayElemAt": [
"$invitee.confirmed",
{ "$indexOfArray": ["$invitee.user_id","$$i._id"] }
]
}
}
}
}
}},
{ "$project": { "inviteeT": 0 } },
{ "$match": { "invitee.0": { "$exists": true } } }
]);
Instead of the $redact
which would be filtering "documents", we use $filter
here with the expression to only return those members of the target array "inviteeT"
which share the same "currentMove"
. Since this is just the "foreign" content, we "join" with the original array using $map
and transposing the elements.
To do that "transposition" of values from the original array, we use the $arrayElemAt
and $indexOfArray
expressions. The $indexOfArray
allows us to match up the target's "_id"
values with the "user_id"
values in the original array and get it's "index" position. We always know this returns a real match because the $lookup
did that part for us.
The "index" value is then supplied to $arrayElemAt
which similarly applies a "mapping" of the values as an array like "$invitee.confirmed"
and returns the value matched at the same index. This is basically a "lookup" between the arrays.
Differing from the first pipeline example, we now still have the "inviteeT"
array as well as our re-written "invitee"
array courtesy of $addFields
. So one way to get rid of that is to add an additional $project
and exclude the unwanted "temporary" array. And of course since we did not $unwind
and "filter", there are still possible results with no matching array entries at all. So the $match
expression uses $exists
to test for the 0
index being present in the array result, which means there is "at least one" result, and discards any documents with empty arrays.
MongoDB 3.6 makes this a bit cleaner as a new syntax for $lookup
allows a more expressive "pipeline" to be given in argument to select the results returned, rather than the simplistic "localField"
and "foreignField"
matching.
Group.aggregate([
{ "$lookup": {
"from": User.collection.name,
"let": {
"ids": "$invitee._id",
"users": "$invitee.user_id",
"confirmed": "$invitee.confirmed",
"currentMove": "$currentMove"
},
"pipeline": [
{ "$match": {
"$expr": {
"$and": [
{ "$in": ["$_id", "$$users"] },
{ "$eq": ["$currentMove", "$$currentMove"] }
]
}
}},
{ "$project": {
"_id": {
"$arrayElemAt": [
"$$ids",
{ "$indexOfArray": ["$$users", "$_id"] }
]
},
"user_id": "$$ROOT",
"confirmed": {
"$arrayElemAt": [
"$$confirmed",
{ "$indexOfArray": ["$$users", "$_id"] }
]
}
}}
],
"as": "invitee"
}},
{ "$match": { "invitee.0": { "$exists": true } } }
])
So there are some slightly "glitchy" things in there with the usage of mapping arrays of specific values for input due to how these are currently passed into the sub-pipeline via the "let"
declaration. This should probably work cleaner, but on the current release candidate this is how it's actually required to be expressed in order to work.
With this new syntax the "let"
allows us to declare "variables" from the current document which can then be referenced in the "pipeline"
expression which will be executed in order to determine which results to return to the target array.
The $expr
here essentially replaces the $redact
or $filter
conditions used before, as well as combining the "local" to "foreign" key matching which also requires us to declare such a variable. Here we mapped the "$invitee.user_id"
values from the source document into a variable which we refer to as "$$users"
in the rest of the expressions.
The $in
operator here is a variant for the aggregation framework which returns a boolean condition where the first argument "value" is found in the second argument "array". So this is the "foreign key" filter part.
Since this is a "pipeline", we can add a $project
stage in addition to the $match
which selected the items from the foreign collection. So again we use a similar "transposition" technique to what was described before. This then gives us control of the "shape" of the documents returned in the array, so we don't manipulate the returned array "after" the $lookup
like we did previously.
The same case applies though, since no matter what you do here the "sub-pipeline" can of course return no results when the filter conditions do not match. So again the same $exists
test is used to discard those documents.
So it's all pretty cool, and once you get used to the power available in the server side "join" functionality of $lookup
you likely will never look back. Whilst the syntax is a lot more terse than the "convenience" function that .populate()
was introduced for, the reduced traffic load, far more advanced uses and general expressiveness basically make up for that.
As a complete example, I'm also including a self contained listing that demonstrates all of these. And if you run it with a MongoDB 3.6 compatible server attached, then you will even get that demonstration as well.
Needs a recent Node.js v8.x release to run with async/await
( or enable in other supported ), but since that's now the LTS release you really should be running that anyway. At least install one to test :)
const mongoose = require('mongoose'),
Schema = mongoose.Schema;
mongoose.Promise = global.Promise;
mongoose.set('debug',true);
const uri = 'mongodb://localhost/rollgroup',
options = { useMongoClient: true };
const userSchema = new Schema({
name: String,
currentMove: Number
})
const groupSchema = new Schema({
name: String,
topic: String,
currentMove: Number,
invitee: [{
user_id: { type: Schema.Types.ObjectId, ref: 'User' },
confirmed: { type: Boolean, default: false }
}]
});
const User = mongoose.model('User', userSchema);
const Group = mongoose.model('Group', groupSchema);
function log(data) {
console.log(JSON.stringify(data, undefined, 2))
}
(async function() {
try {
const conn = await mongoose.connect(uri,options);
let { version } = await conn.db.admin().command({'buildInfo': 1});
// Clean data
await Promise.all(
Object.entries(conn.models).map(([k,m]) => m.remove() )
);
// Add some users
let users = await User.insertMany([
{ name: 'Bill', currentMove: 1 },
{ name: 'Ted', currentMove: 2 },
{ name: 'Fred', currentMove: 3 },
{ name: 'Sally', currentMove: 4 },
{ name: 'Harry', currentMove: 5 }
]);
await Group.create({
name: 'Group1',
topic: 'This stuff',
currentMove: 3,
invitee: users.map( u =>
({ user_id: u._id, confirmed: (u.currentMove === 3) })
)
});
await (async function() {
console.log('Unwinding example');
let result = await Group.aggregate([
{ "$unwind": "$invitee" },
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "invitee.user_id"
}},
{ "$unwind": "$invitee.user_id" },
{ "$redact": {
"$cond": {
"if": { "$eq": ["$currentMove", "$invitee.user_id.currentMove"] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$group": {
"_id": "$_id",
"name": { "$first": "$name" },
"topic": { "$first": "$topic" },
"currentMove": { "$first": "$currentMove" },
"invitee": { "$push": "$invitee" }
}}
]);
log(result);
})();
await (async function() {
console.log('Using $filter example');
let result = await Group.aggregate([
{ "$lookup": {
"from": User.collection.name,
"localField": "invitee.user_id",
"foreignField": "_id",
"as": "inviteeT"
}},
{ "$addFields": {
"invitee": {
"$map": {
"input": {
"$filter": {
"input": "$inviteeT",
"as": "i",
"cond": { "$eq": ["$$i.currentMove","$currentMove"] }
}
},
"as": "i",
"in": {
"_id": {
"$arrayElemAt": [
"$invitee._id",
{ "$indexOfArray": ["$invitee.user_id", "$$i._id"] }
]
},
"user_id": "$$i",
"confirmed": {
"$arrayElemAt": [
"$invitee.confirmed",
{ "$indexOfArray": ["$invitee.user_id","$$i._id"] }
]
}
}
}
}
}},
{ "$project": { "inviteeT": 0 } },
{ "$match": { "invitee.0": { "$exists": true } } }
]);
log(result);
})();
await (async function() {
if (parseFloat(version.match(/\d\.\d/)[0]) >= 3.6) {
console.log('New $lookup example. Yay!');
let result = await Group.collection.aggregate([
{ "$lookup": {
"from": User.collection.name,
"let": {
"ids": "$invitee._id",
"users": "$invitee.user_id",
"confirmed": "$invitee.confirmed",
"currentMove": "$currentMove"
},
"pipeline": [
{ "$match": {
"$expr": {
"$and": [
{ "$in": ["$_id", "$$users"] },
{ "$eq": ["$currentMove", "$$currentMove"] }
]
}
}},
{ "$project": {
"_id": {
"$arrayElemAt": [
"$$ids",
{ "$indexOfArray": ["$$users", "$_id"] }
]
},
"user_id": "$$ROOT",
"confirmed": {
"$arrayElemAt": [
"$$confirmed",
{ "$indexOfArray": ["$$users", "$_id"] }
]
}
}}
],
"as": "invitee"
}},
{ "$match": { "invitee.0": { "$exists": true } } }
]).toArray();
log(result);
}
})();
await (async function() {
console.log("Horrible populate example :(");
let results = await Group.find();
results = await Promise.all(
results.map( r =>
User.populate(r,{
path: 'invitee.user_id',
match: { "$where": `this.currentMove === ${r.currentMove}` }
})
)
);
console.log("All members still there");
log(results);
// Then we clean it for null values
results = results.map( r =>
Object.assign(r,{
invitee: r.invitee.filter(i => i.user_id !== null)
})
);
console.log("Now they are filtered");
log(results);
})();
} catch(e) {
console.error(e);
} finally {
mongoose.disconnect();
}
})()
Gives the output for each example as:
Mongoose: users.remove({}, {})
Mongoose: groups.remove({}, {})
Mongoose: users.insertMany([ { __v: 0, name: 'Bill', currentMove: 1, _id: 5a0afda01643cf41789e500a }, { __v: 0, name: 'Ted', currentMove: 2, _id: 5a0afda01643cf41789e500b }, { __v: 0, name: 'Fred', currentMove: 3, _id: 5a0afda01643cf41789e500c }, { __v: 0, name: 'Sally', currentMove: 4, _id: 5a0afda01643cf41789e500d }, { __v: 0, name: 'Harry', currentMove: 5, _id: 5a0afda01643cf41789e500e } ], {})
Mongoose: groups.insert({ name: 'Group1', topic: 'This stuff', currentMove: 3, _id: ObjectId("5a0afda01643cf41789e500f"), invitee: [ { user_id: ObjectId("5a0afda01643cf41789e500a"), _id: ObjectId("5a0afda01643cf41789e5014"), confirmed: false }, { user_id: ObjectId("5a0afda01643cf41789e500b"), _id: ObjectId("5a0afda01643cf41789e5013"), confirmed: false }, { user_id: ObjectId("5a0afda01643cf41789e500c"), _id: ObjectId("5a0afda01643cf41789e5012"), confirmed: true }, { user_id: ObjectId("5a0afda01643cf41789e500d"), _id: ObjectId("5a0afda01643cf41789e5011"), confirmed: false }, { user_id: ObjectId("5a0afda01643cf41789e500e"), _id: ObjectId("5a0afda01643cf41789e5010"), confirmed: false } ], __v: 0 })
Unwinding example
Mongoose: groups.aggregate([ { '$unwind': '$invitee' }, { '$lookup': { from: 'users', localField: 'invitee.user_id', foreignField: '_id', as: 'invitee.user_id' } }, { '$unwind': '$invitee.user_id' }, { '$redact': { '$cond': { if: { '$eq': [ '$currentMove', '$invitee.user_id.currentMove' ] }, then: '$$KEEP', else: '$$PRUNE' } } }, { '$group': { _id: '$_id', name: { '$first': '$name' }, topic: { '$first': '$topic' }, currentMove: { '$first': '$currentMove' }, invitee: { '$push': '$invitee' } } } ], {})
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"invitee": [
{
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afda01643cf41789e5012",
"confirmed": true
}
]
}
]
Using $filter example
Mongoose: groups.aggregate([ { '$lookup': { from: 'users', localField: 'invitee.user_id', foreignField: '_id', as: 'inviteeT' } }, { '$addFields': { invitee: { '$map': { input: { '$filter': { input: '$inviteeT', as: 'i', cond: { '$eq': [ '$$i.currentMove', '$currentMove' ] } } }, as: 'i', in: { _id: { '$arrayElemAt': [ '$invitee._id', { '$indexOfArray': [ '$invitee.user_id', '$$i._id' ] } ] }, user_id: '$$i', confirmed: { '$arrayElemAt': [ '$invitee.confirmed', { '$indexOfArray': [ '$invitee.user_id', '$$i._id' ] } ] } } } } } }, { '$project': { inviteeT: 0 } }, { '$match': { 'invitee.0': { '$exists': true } } } ], {})
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"invitee": [
{
"_id": "5a0afda01643cf41789e5012",
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"confirmed": true
}
],
"__v": 0
}
]
New $lookup example. Yay!
Mongoose: groups.aggregate([ { '$lookup': { from: 'users', let: { ids: '$invitee._id', users: '$invitee.user_id', confirmed: '$invitee.confirmed', currentMove: '$currentMove' }, pipeline: [ { '$match': { '$expr': { '$and': [ { '$in': [ '$_id', '$$users' ] }, { '$eq': [ '$currentMove', '$$currentMove' ] } ] } } }, { '$project': { _id: { '$arrayElemAt': [ '$$ids', { '$indexOfArray': [ '$$users', '$_id' ] } ] }, user_id: '$$ROOT', confirmed: { '$arrayElemAt': [ '$$confirmed', { '$indexOfArray': [ '$$users', '$_id' ] } ] } } } ], as: 'invitee' } }, { '$match': { 'invitee.0': { '$exists': true } } } ])
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"invitee": [
{
"_id": "5a0afda01643cf41789e5012",
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"confirmed": true
}
],
"__v": 0
}
]
Horrible populate example :(
Mongoose: groups.find({}, { fields: {} })
Mongoose: users.find({ _id: { '$in': [ ObjectId("5a0afda01643cf41789e500a"), ObjectId("5a0afda01643cf41789e500b"), ObjectId("5a0afda01643cf41789e500c"), ObjectId("5a0afda01643cf41789e500d"), ObjectId("5a0afda01643cf41789e500e") ] }, '$where': 'this.currentMove === 3' }, { fields: {} })
All members still there
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"__v": 0,
"invitee": [
{
"user_id": null,
"_id": "5a0afda01643cf41789e5014",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afda01643cf41789e5013",
"confirmed": false
},
{
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afda01643cf41789e5012",
"confirmed": true
},
{
"user_id": null,
"_id": "5a0afda01643cf41789e5011",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afda01643cf41789e5010",
"confirmed": false
}
]
}
]
Now they are filtered
[
{
"_id": "5a0afda01643cf41789e500f",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"__v": 0,
"invitee": [
{
"user_id": {
"_id": "5a0afda01643cf41789e500c",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afda01643cf41789e5012",
"confirmed": true
}
]
}
]
So using .populate()
here is actually pretty horrible. Sure it looks like less, but it's actually doing a lot of things that simply are not needed, and all because the "join" does not happen on the server:
// Note that we cannot populate "here" since we need the returned value
let results = await Group.find();
// The value is only in context as we use `Array.map()` to process each result
results = await Promise.all(
results.map( r =>
User.populate(r,{
path: 'invitee.user_id',
match: { "$where": `this.currentMove === ${r.currentMove}` }
})
)
);
console.log("All members still there");
log(results);
// Then we clean it for null values
results = results.map( r =>
Object.assign(r,{
invitee: r.invitee.filter(i => i.user_id !== null)
})
);
console.log("Now they are filtered");
log(results);
So I also included that in the output above, as well as the whole code listing.
The problem becomes evident as you cannot "chain" the populate directly to the first query. You actually need to return the documents ( potentially ALL of them ) in order to use the current document value in a subsequent populate. And this MUST be processed for each document returned.
Not only that but populate()
is NOT going to "filter" the array to only those which match, even with the query condition. All it does is set's the unmatched elements to null
:
[
{
"_id": "5a0afa889f9f7e4064d8794d",
"name": "Group1",
"topic": "This stuff",
"currentMove": 3,
"__v": 0,
"invitee": [
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d87952",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d87951",
"confirmed": false
},
{
"user_id": {
"_id": "5a0afa889f9f7e4064d8794a",
"__v": 0,
"name": "Fred",
"currentMove": 3
},
"_id": "5a0afa889f9f7e4064d87950",
"confirmed": true
},
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d8794f",
"confirmed": false
},
{
"user_id": null,
"_id": "5a0afa889f9f7e4064d8794e",
"confirmed": false
}
]
}
]
This then needs an Array.filter()
to be processed again for "each" document returned, which can finally remove the unwanted array items and give you the same result the other aggregation queries are doing.
So it's "really wasteful" and just not a good way to do things. Little point in having a database, when you're actually doing the majority of processing on the server. In fact, we may have well simply returned the populated result and then run an Array.filter()
in order to remove the unwanted entries.
This is just not how you write fast and effective code. So the example here is sometimes "what looks simple" is actually doing a lot more damage than good.
Upvotes: 3