Reputation: 7133
My db contains projects and phases. Projects can have multiple phases. The models are similar to these:
Phase:
var phaseSchema = new mongoose.Schema({
project: { type: mongoose.Schema.Types.ObjectId, ref: 'Project' }
});
Project:
var projectSchema = new mongoose.Schema({
name : { type: String }
});
Currently I'm using the following approach to retrieve the phases for each project:
var calls = [];
var projects = _.each(projects, function (p) {
calls.push(function (callback) {
req.app.db.models.Phase.find({ project: p._id }, function (err, doc) {
if (err) {
callback(err);
} else {
p.phases = doc;
callback();
}
});
})
});
async.parallel(calls, function (err) {
workflow.outcome.projects = projects;
return workflow.emit('response');
});
As you can see I'm not passing anything to callback()
just (ab)using async's parallel
to wait with the response until the lookup finishes.
Alternatively I could pass the phase object to the callback but then in parallel
I should iterate over phase and over projects to find the appropriate project for the current phase.
Am I falling into a common pitfall with this design and for some reason it would be better to iterate over the projects and the phases again, or I should take a completely different approach?
Upvotes: 1
Views: 626
Reputation: 50406
I actually think in this case you would be better of running one query to match all the potential results. For the "test" query you would issue all the _id
values as an $in
clause, then just do some matching on the results to your source array to assign the match(ed) document(s):
// Make a hash from the source for ease of matching
var pHash = {};
_.each(projects,function(p) {
pHash[p._id.toString()] = p;
});
// Run the find with $in
req.app.db.models.Phase.find({ "project": { "$in": _.keys(pHash) } },function(err,response) {
_.each(response,function(r) {
// Assign phases array if not already there
if (!phash[r.project.toString()].hasOwnProperty("phases")
pHash[r.project.toString()].phases = [];
// Append to array of phases
pHash[r.project.toString()].phases.push(r)
});
// Now return the altered hash as orginal array
projects = _.mapObject(pHash,function(val,key) {
return val;
});
});
Also adding like you say "projects can have multiple phases", so the logic would be an "array" rather than an assignment of a single value.
On the other hand, if you have MongoDB 3.2 available, then the $lookup
aggregation pipeline operator seems to be for you. In this case you would just be working with the Projects
model, but doing the $lookup
on the `"phases" collection. With "collection" being the operative term here, since it is a server side operation that therefore only knows about collections and not the application "models":
// BTW all models are permanently registered with mongoose
mongoose.model("Project").aggregate(
[
// Whatever your match conditions were for getting the project list
{ "$match": { .. } },
// This actually does the "join" (but really a "lookup")
{ "$lookup": {
"from": "phases",
"localField": "_id",
"foreignField": "project",
"as": "phases"
}}
],function(err,projects) {
// Now all projects have an array containing any matched phase
// or an empty array. Just like a "left join"
})
);
That would be the most efficient way to handle this since all the work is done on the server.
So what you seem to be asking here is basically the "reverse case" of .populate()
where instead of holding the "phases" as references on the "project" object the reference to the project is instead listed in the "phase".
In that case, either form of "lookup" should be what you are looking for. Either where you emulate that join via the $in
and "mapping" stage, or directly using the aggregation framework $lookup
operator.
Either way, this reduces the server contact down to "one" operation, where as your current approach is going to create a lot of connections and each up a fair amount of resources. Also no need to "Wait for all responses". I'd wager that both were much faster as well.
Upvotes: 1