Reputation: 171
We need to be able to quickly perform queries across the set of a user's friends and friends of friends. This would be relatively straightforward in a relational database, but I'm somewhat stuck on the best way to accomplish it in MongoDB. We store the user IDs of a user's friends in an array in the user document, so the obvious solution is to do this:
While straightforward, this seems like a huge amount of back and forth, as compared to what we could do with a join in a relational database. Is there a more efficient way to do this in MongoDB, or is this a problem best suited for a RDBMS?
Upvotes: 17
Views: 8329
Reputation: 161
I asked Eliot Horowitz this very same question recently at MongoDB SV conference. He said the way he would structure it is to store each users friends as embedded documents within each user. For example, the structure might look like this:
{
_id : ObjectId("4e77bb3b8a3e000000004f7a"),
username : "alex",
friends : ["283956723823626626aa", "226567377578888888as", "8738783888aas88a8a88" ]
}
then you can have an index on user.friends
http://www.mongodb.org/display/DOCS/Indexes#Indexes-IndexingArrayElements
"When a document's stored value for a index key field is an array, MongoDB indexes each element of the array. See the Multikeys page for more information."
so to find all of "alex"'s friends I can just do:
db.user.find( { 'friends' : '4e77bb3b8a3e000000004f7a'});
Upvotes: 13
Reputation: 6510
I believe that this is something that is better handled by an RDBMS (barring graph DBs) since you clearly need to perform a "join" operation. Although an RDBMS might implement it the same way, it could implement the join more efficiently and distribute the information more efficiently than MongoDB.
With that said, the overhead of performing the "join" query atomically might prove too costly if you have a large cluster of db nodes and a massive amount of users.
If you're not worried about consistency and atomicity of the query, and all you want is to prevent the back-and-forth between the application and the DB, you can write a JavaScript function which will perform the entire query on MongoDB, or use a MapReduce operation for more efficient distributed queries.
Upvotes: 2
Reputation: 45307
this seems like a huge amount of back and forth, as compared to what we could do with a join in a relational database
This is all very relative. Your basic assumption on fetching "friends of friends of friends" is correct, it's a few hops and a couple of in-memory "distincts".
However, from the raw perspective of "total work done", this isn't very different from what you would have do with SQL. Yes it's a relatively simple SQL query, but the server itself still has to do basically the same amount of work, give or take some some network traffic.
Is there a more efficient way to do this in MongoDB, or is this a problem best suited for a RDBMS?
Is there a better way in MongoDB? Probably not. But doing "self-joins" in SQL doesn't easily scale across multiple servers. In fact, trying to do this across multiple servers basically devolves into a similar process to the MongoDB process.
Technically, this is a job best done by a Graph Database which is neither MongoDB nor an RDBMS.
For Graph Databases you can take a look at Trinity for .NET or NEO4J.
Upvotes: 7