Reputation: 705
I have 3 collections in MongoDB that cannot have their schema changed. Some queries need to access the 3 collections.
I know that I need multiple queries to do this but I'm not sure what the most efficient method of doing this is. The folllowing example is simplified :
My data contains a "User" collection that serves as a logical parent to the other two collections. The other two collections are "DVD" and "CD". A user can have multiple CDs or DVDs
User Document
id : "jim",
location : "sweden"
CD Document
name : "White Album",
owner : "jim"
DVD Document
name : "Fargo",
owner : "jim"
Now, the approach I am currently taking is as follows. If I want get back all of the CDs and DVDs for users in Sweden.
Step 1
Get all users in Sweden and return a cursor
Step 2
Iterate through the each user in the cursor and perform a lookup on both the DVD and CD collections to see if the users id matches the owner field
Step 3
If it does add the user to an array to be returned
This approach requires 2 additional queries and seems really inefficient to me. Is there a more efficient way of doing this?
Upvotes: 3
Views: 1270
Reputation: 26012
You can make some improvements on the query as follows.
db.user.find({location:"sweden"},{id:1})
db.cd.find({owner : {$in : ["jim", "tom", ...]}}) db.dvd.find({owner : {$in : ["jim", "tom", ...]}})
Also add indexes on the collections to improve query performances.
Upvotes: 2
Reputation: 45
If you can't change your schema, and you want know how many users from Sweden have a CD or DVD, then i think this is the smallest method:
Then you are with only 2 queries but if your DVD and CD collections are giant enough this probably won't be faster than your method, even that this method only use 2 queries.
Keep in mind that lower number of queries doesn't mean necessarily faster.
Sorry for the english ;)
Upvotes: 0
Reputation: 43884
It isn't as inefficient as it sounds.
You are most likely thinking of SQL techs whereby a result set is made each time you query and that is in turn cached on disk or in memory.
MongoDB streams directly from the data files every cursor batch which means its data is "live" from the database unlike a result set. This also means that pinging the odd query is also a lot lees resource intensive.
One option is, as you said, bring back all users and each iteration judge if they should be displayed since they have related records. This could evenly distribute the cursors stopping overloading however, there is still the possibility of cursor overload on the server.
One other option is to iterate all users from Sweden and get back a huge user_id
array with which to query the CD and DVD collection. From there you would then match them up in your application and return as needed.
However, exactly how you solve this is upto your scenario and how much data you have.
Upvotes: 0