Combine $geoNear with Another Collection

Question

I have 2 collections, resto and meal (each meal document has resto id to which it belongs to). I want to fetch nearby restos that have at least 1 meal. Right now, I'm able to fetch nearby restaurants, but how do I combine make sure they have at least 1 meal?

restoModel.aggregate([{
    "$geoNear": {
        "near": {
            "type": "Point",
            "coordinates": coordinates
        },
        "minDistance": 0,
        "maxDistance": 1000,
        "distanceField": "distance",
        "spherical": true,
        "limit": 10 // fetch 10 restos at a time
    }
}]);

Sample resto doc:

{ 
  _id: "100", 
  location: { coordinates: [ -63, 42 ], type: "Point" },
  name: "Burger King"
}

Sample meal doc:

{ 
  resto_id: "100", // restaurant that this meal belongs to
  name: "Fried Chicken",
  price: 12.99
}

I can create a pipeline, fetch 10 restaurants each joined with its associated meal documents, and remove restaurants that don't have meal. But a single fetch could return 0 documents if all of them have no meal. How do I make sure it keeps searching until 10 meal-having restos are returned?

Neil Lunn · Accepted Answer

This actually has a few approaches to consider, which have their own benefits or pitfalls associated.

Embedding

The cleanest and simplest approach is simply to actually embed the "menu" and "count" within the parent document of the restaurant instead.

This is actually also quite reasonable since you appear to be stuck in thinking in relational modelling terms, where MongoDB is not an RDBMS and nor "should" it generally be used as one. Instead we play to the strengths of what MongoDB can do.

The structure would then be like this:

{ 
  _id: "100", 
  location: { coordinates: [ -63, 42 ], type: "Point" },
  name: "Burger King",
  menuCount: 1,
  menu: [
    {
      name: "Fried Chicken",
      price: 12.99
    }
  ]
}

This is actually then quite simple to query, and in fact we can simply apply using a regular $nearSphere since we really have no further need for aggregation conditions:

restoModel.find({
  "location": {
    "$nearSphere": {
      "$geometry": {
        "type": "Point",
        "coordinates": coordinates
      },
      "$maxDistance": 1000
    }
  },
  "menuCount": { "$gt": 1 }
}).skip(0).limit(10)

Simple and effective. This is in fact exactly why you should be using MongoDB, since the "related" data is already embedded in the parent item. There are of course "trade-offs" to this, but the biggest advantages are in speed and efficiency.

Maintaining the menu items within the parent as well as the present count is also simple, as we can simply "increment" the count when new items are added:

restoModel.update(
  { "_id": id, "menu.name": { "$ne": "Pizza" } },
  {
    "$push": { "menu": { "name": "Pizza", "price": 19.99 } },
    "$inc": { "menuCount": 1 }
  }
)

Which adds the new item where it does not already exist and increments the number of menu items, all in one atomic operation, which is another reason why you embed relationships where the updates have an effect on both parent and child at the same time.

This is really what you should be going for. Sure there are limits to what you actually can embed, but this is just a "menu" and is of course relatively small in size comparison to the other sorts of relationships we could define.

Elliot of MongoDB actually put it best by stating "The entire content of War and Peace as text fit's within 4MB", and that was at a time when the limit on a BSON Document was 4MB. Now it's 16MB and more than capable of handling any "menu" most customers could be bothered browsing through.

Aggregate with $lookup

Where you are keeping to a standard relational pattern there are going to be some problems to overcome. Mostly here is the big difference from "embedding" is that since the data for the "menu" is in another collection, then you need $lookup in order to "pull" those in, and subsequently "count" how many there are.

In relation to a "nearest" query, unlike the sample above we cannot put those additional constraints "within the 'near' query itself", which means that out of the default 100 results returned by $geoNear, some of the items "may not" meet the additional constraint, which you have no choice but to apply later, "after" the $lookup is performed:

restoModel.aggregate([
  { "$geoNear": {
    "near": {
      "type": "Point",
      "coordinates": coordinates
    },
    "spherical": true,
    "limit": 150,
    "distanceField": "distance",
    "maxDistance": 1000
  }},
  { "$lookup": {
     "from": "menuitems",
     "localField": "_id",
     "foreignField": "resto_id",
     "as": "menu"
  }},
  { "$redact": {
    "$cond": {
      "if": { "$gt": [ { "$size": "$menu" }, 0 ] },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$limit": 10 }
])

As such your only option here is to "increase" the number of "possible" returns, and then do the additional pipeline stages to "join", "calculate" and "filter". Also leaving the eventual $limit to it's own pipeline stage.

A noted problem here is with "paging" of results. Which is because the "next page" needs to essentially "skip over" the results of the prior page. To this end, it is better to implement a "forward paging" concept, much as described in this post: Implementing Pagination In MongoDB

Where the general idea is to "exclude" the prior "seen" results, via $nin. This is actually something that can be done using the "query" option of $geoNear:

restoModel.aggregate([
  { "$geoNear": {
    "near": {
      "type": "Point",
      "coordinates": coordinates
    },
    "spherical": true,
    "limit": 150,
    "distanceField": "distance",
    "maxDistance": 1000,
    "query": { "_id": { "$nin": list_of_seen_ids } }
  }},
  { "$lookup": {
     "from": "menuitems",
     "localField": "_id",
     "foreignField": "resto_id",
     "as": "menu"
  }},
  { "$redact": {
    "$cond": {
      "if": { "$gt": [ { "$size": "$menu" }, 0 ] },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$limit": 10 }
])

Then at least you don't get the same results as the previous page. But it's a bit more work, and quite a lot more work than what can be done with the embedded model as shown earlier.

Conclusion

The general case leads towards "embedding" as the better option for this use case. You have a "small" number of related items, and the data makes more sense actually being directly associated with the parent as typically you want the menu and the restaurant information at the same time.

Modern releases of MongoDB since 3.4 do allow a "view" to be created, but the general premise is based on usage of the aggregation pipeline. As such we could "pre-join" the data in a "view", however since any query operations effectively pick up the underlying aggregation pipeline statement to process, the standard query operators of $nearSphere and the like cannot be applied, as standard queries are actually "appended" to the defined pipeline. In a similar manner you also cannot use $geoNear with "views".

Maybe the constraints will change in the future, but right now the limitations make this not viable as an option since we cannot perform the required queries on the "pre-joined" source with a more relational design.

So you can basically do it in either of the two ways presented, but for my money I would model as embedded here instead.

Combine $geoNear with Another Collection

Answers (1)

Embedding

Aggregate with $lookup

Conclusion

Related Questions