laxus
laxus

Reputation: 153

MongoDB: How can I execute multiple dereferrence in a single query?

There are several collections, i.e. Country, Province, City, Univ.

Just like in the real world, every country has several provinces, and every province has several cities, and every city has several universities.

How can I know whether a university is in the given country?For example, country0 may have some universities, what are their _ids?

Documents in those collections are showed below:

{
    _id:"country0",
    provinces:[
    {
        $ref:"Province",
        $id:"province0"
    },
    ...
    ] 
}

{
    _id:"province0",
    belongs:{$ref:"Country", $id:"country0"},
    cities:[
    {
        $ref:"City",
        $id:"city0"
    }
    ...
    ]
}

{
    _id:"city0",
    belongsTo:{$ref:"Province",$id:"province0"},
    univs:[
    {
        $ref:"Univ",
        $id:"univ0"
    }
    ...
    ]
}

{
    _id:"univ0",
    address:{$ref:"City", $id:"city0"}
}

If there are only two collections, I know fetch() may be useful.

Also, python drivers may be useful, but I can't know their performance well, because I can't use db.system.profile in a .py file.

Upvotes: 0

Views: 87

Answers (1)

wdberkeley
wdberkeley

Reputation: 11671

MongoDB doesn't do joins. N queries are required to get information from N collections. In this situation, to get the _id's of universities in a given country in an array one could do the following (in the mongo shell):

> var country = db.countries.findOne({ "_id": "country0" });
> var province_ids = [];
> country.provinces.forEach(function(province) { province_ids.push(province["$id"]); });
> var provinces = db.provinces.find({ "_id": { "$in": province_ids });
> var city_ids = [];
> provinces.forEach(function(province) { province.cities.forEach(function(city) { city_ids.push(city["$id"]); }); });
> var cities = db.cities.find({ "_id": { "$in": city_ids } });
> univ_ids = [];
> cities.forEach(function(city) { city.univs.forEach(function(univ) { univ_ids.push(univ["$id"]); }); });

It's also possible to accomplish the same thing using the belongsTo field, using similar steps. This is cumbersome and it seems like there should be a better way. There is! Normalize the data. Countries have provinces, which have cities, which have universities, but the relationships are fixed and not of huge cardinality. For doing queries like "what universities are in a given country?" I would suggest storing province documents entirely within countries and university documents entirely within city documents. You could store cities inside of province documents, or inside country documents directly, but a province or country could have hundreds or thousands of cities and this might be too much information for one document (16MB limit per document in MongoDB). Having provinces in countries and universities in cities reduces the number of queries necessary to two.

Another option is to store more information in each child document. Essentially you have a forest (a collection of trees): countries are parents of provinces which are parents of cities which are parents of universities. The belongsTo field is a parent reference. You could store a reference to all ancestors instead of just the parent. Then finding all universities in a given country is one query on the universities collection.

> db.universities.findOne();
{
    _id: "univ0",
    city: "city0",
    province: "province0",
    country: "country0"
}

> db.universities.find({ "country": "country0" });

The schema design that is best for you depends on the types of queries your application will need to answer and their relative frequencies and importance. I can't determine that from your question so I can't firmly recommend one schema over another.

As to your mini-question about performance and the db.system.profile collection, note that db.system.profile is a collection. You can query it from a .py file using a driver.

Upvotes: 1

Related Questions