Reputation: 751
After learning about performance and schema design in MongoDB, I still can´t figure out how would I make the schema design in an application when performance is a must.
Let´s imagine if we have to make YouTube to work with MongoDB as its database. How would you make the schema?
OPTION 1: two collections (videos collection and comments collection)
Pros: adding, deleting and editing comments affects only the comments collection, therefore these operations would be more efficient.
Cons: Retrieving videos and comments would be 2 different queries to the database, one for videos and one for comments.
OPTION 2: single collection (videos collection with the comments embedded)
Pros: You retrieve videos and its comments with a single query.
Cons: Adding, deleting and editing comments affect the video Document, therefore these operations would be less efficient.
So what do you think? Are my guesses true?
Upvotes: 2
Views: 244
Reputation: 21766
As usual the answer is, it depends. As as a rule of thumb you should favour embedding, unless you need to regularly query the embedded objects on its own or if the embedded array is likely to get too large(>~100 records). Using this guideline, there are a few questions you need to ask regarding your application.
How is your application going to access the data ? Are you only ever going to show the comments on the same page as the associated video ? Or do you want to provide the options to show all comments for a given user across all movies ? The first scenario favours embedding (one collection), whereas you probably would be better of with two collections in the second scenario.
Secondly, how many comments do you expect for each video ? Taking the analogy of IMDB, you could easily expect more than 100 comments for a popular video, so that means you are better off creating two separate collections as the embedded array of comments would grow large quite quickly. I wouldn't be too concerned about the overhead of an application join, they are generally comparable in speed compared to a server-side join in a relational database provided your collections are properly indexed.
Finally, how often are users going to update their comments after their initial post ? If you lock the comments after 5 minutes like on StackOverflow users may not update their comments very often. In that case the overhead of updating or deleting comments in the video collection will be negligible and may even be outweigh the cost of performing a second query in a separate comments collection.
Upvotes: 1
Reputation: 20722
As a caller in the desert, I have to say that embedding should only be used under very special circumstances:
I have written an article about the above, which describes the respective problems in greater detail.
And furthermore, I do not see any advantage in having the comments with the videos. The questions to answer would be
Note that the only connection between videos and comments here is about a given video, so you already have the _id
or something else to positively identify the video. Furthermore, you don't want to load all comments at once, especially if you have a lot of them, since this would decrease UX because of long load times.
Let's say it is the _id
. So, with it, you'd be able to have paged comments easily:
db.comments.find({"video_id": idToFind})
.skip( (page-1) * pageSize )
.limit( pageSize )
hth
Upvotes: 3
Reputation: 506
You should use embedded for better performance. Your I/O's will be lesser. In worst case? it might take a bit long to persist the document in the DB but it wont take much time to retrieve it.
You should either compromise persistence over reads or vise versa depending on your application needs.
Hence it is important to choose your db wisely.
Upvotes: 0