Travis
Travis

Reputation: 705

What is the most efficient way to query multiple collections in MongoDB?

I have 3 collections in MongoDB that cannot have their schema changed. Some queries need to access the 3 collections.

I know that I need multiple queries to do this but I'm not sure what the most efficient method of doing this is. The folllowing example is simplified :

My data contains a "User" collection that serves as a logical parent to the other two collections. The other two collections are "DVD" and "CD". A user can have multiple CDs or DVDs

User Document 
id : "jim",
location : "sweden"

CD Document
name : "White Album",
owner : "jim"

DVD Document
name : "Fargo",
owner : "jim"

Now, the approach I am currently taking is as follows. If I want get back all of the CDs and DVDs for users in Sweden.

Step 1

Get all users in Sweden and return a cursor

Step 2

Iterate through the each user in the cursor and perform a lookup on both the DVD and CD collections to see if the users id matches the owner field

Step 3

If it does add the user to an array to be returned

This approach requires 2 additional queries and seems really inefficient to me. Is there a more efficient way of doing this?

Upvotes: 3

Views: 1270

Answers (3)

Parvin Gasimzade
Parvin Gasimzade

Reputation: 26012

You can make some improvements on the query as follows.

  • While selecting users, return only the id field.

db.user.find({location:"sweden"},{id:1})

  • Create a String list that contains user names and pass those list using the $in query.Run $in query on cd & dvd collections as follows :
db.cd.find({owner : {$in : ["jim", "tom", ...]}})
db.dvd.find({owner : {$in : ["jim", "tom", ...]}})

Also add indexes on the collections to improve query performances.

Upvotes: 2

Frank Vieira
Frank Vieira

Reputation: 45

If you can't change your schema, and you want know how many users from Sweden have a CD or DVD, then i think this is the smallest method:

  • users_ids * = Get all users_id from DVD and CD collections. Get all users that have itself id in * users_ids * and are from Sweden.

Then you are with only 2 queries but if your DVD and CD collections are giant enough this probably won't be faster than your method, even that this method only use 2 queries.

Keep in mind that lower number of queries doesn't mean necessarily faster.

Sorry for the english ;)

Upvotes: 0

Sammaye
Sammaye

Reputation: 43884

It isn't as inefficient as it sounds.

You are most likely thinking of SQL techs whereby a result set is made each time you query and that is in turn cached on disk or in memory.

MongoDB streams directly from the data files every cursor batch which means its data is "live" from the database unlike a result set. This also means that pinging the odd query is also a lot lees resource intensive.

One option is, as you said, bring back all users and each iteration judge if they should be displayed since they have related records. This could evenly distribute the cursors stopping overloading however, there is still the possibility of cursor overload on the server.

One other option is to iterate all users from Sweden and get back a huge user_id array with which to query the CD and DVD collection. From there you would then match them up in your application and return as needed.

However, exactly how you solve this is upto your scenario and how much data you have.

Upvotes: 0

Related Questions