Akos K
Akos K

Reputation: 7133

Delay response until all queries finished

My db contains projects and phases. Projects can have multiple phases. The models are similar to these:

Phase:

var phaseSchema = new mongoose.Schema({
  project: { type: mongoose.Schema.Types.ObjectId, ref: 'Project' }
});

Project:

var projectSchema = new mongoose.Schema({
  name : { type: String }
});

Currently I'm using the following approach to retrieve the phases for each project:

var calls = [];
var projects = _.each(projects, function (p) {
  calls.push(function (callback) {
    req.app.db.models.Phase.find({ project: p._id }, function (err, doc) {
      if (err) {
        callback(err);
      } else {
        p.phases = doc;
        callback();
      }
    });
  })
});

async.parallel(calls, function (err) {
  workflow.outcome.projects = projects;
  return workflow.emit('response');
});

As you can see I'm not passing anything to callback() just (ab)using async's parallel to wait with the response until the lookup finishes.

Alternatively I could pass the phase object to the callback but then in parallel I should iterate over phase and over projects to find the appropriate project for the current phase.

Am I falling into a common pitfall with this design and for some reason it would be better to iterate over the projects and the phases again, or I should take a completely different approach?

Upvotes: 1

Views: 626

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50406

I actually think in this case you would be better of running one query to match all the potential results. For the "test" query you would issue all the _id values as an $in clause, then just do some matching on the results to your source array to assign the match(ed) document(s):

Matching all at once

// Make a hash from the source for ease of matching
var pHash = {};
_.each(projects,function(p) {
    pHash[p._id.toString()] = p;        
});

// Run the find with $in
req.app.db.models.Phase.find({ "project": { "$in": _.keys(pHash) } },function(err,response) {
       _.each(response,function(r) {
           // Assign phases array if not already there
           if (!phash[r.project.toString()].hasOwnProperty("phases")
               pHash[r.project.toString()].phases = [];
           // Append to array of phases
           pHash[r.project.toString()].phases.push(r)
       });

       // Now return the altered hash as orginal array
       projects = _.mapObject(pHash,function(val,key) {
           return val;
       });

});

Also adding like you say "projects can have multiple phases", so the logic would be an "array" rather than an assignment of a single value.


More efficient $lookup

On the other hand, if you have MongoDB 3.2 available, then the $lookup aggregation pipeline operator seems to be for you. In this case you would just be working with the Projects model, but doing the $lookup on the `"phases" collection. With "collection" being the operative term here, since it is a server side operation that therefore only knows about collections and not the application "models":

// BTW all models are permanently registered with mongoose

mongoose.model("Project").aggregate(
    [
        // Whatever your match conditions were for getting the project list
        { "$match": { .. } },

        // This actually does the "join" (but really a "lookup")
        { "$lookup": {
            "from": "phases",
            "localField": "_id",
            "foreignField": "project",
            "as": "phases"
        }}
    ],function(err,projects) {
        // Now all projects have an array containing any matched phase
        // or an empty array. Just like a "left join"
    })
);   

That would be the most efficient way to handle this since all the work is done on the server.

So what you seem to be asking here is basically the "reverse case" of .populate() where instead of holding the "phases" as references on the "project" object the reference to the project is instead listed in the "phase".

In that case, either form of "lookup" should be what you are looking for. Either where you emulate that join via the $in and "mapping" stage, or directly using the aggregation framework $lookup operator.

Either way, this reduces the server contact down to "one" operation, where as your current approach is going to create a lot of connections and each up a fair amount of resources. Also no need to "Wait for all responses". I'd wager that both were much faster as well.

Upvotes: 1

Related Questions