Reputation: 3
I have a blog post server that will contains million of articles, and i need to be able to get all Articles written by User A.
What would be the best schema design.
1) Separate both User and Articles documents, and to get user A Articles search in all the million record for the User's id
articles.find({Writer_id: User_A.id})
2) Put a article id reference inside User schema. ex:
userSchema = {
name: "name",
age: "age",
articles: [ {type:mongoose.Article_id}, {type:mongoose.Article_id} ]
}
And search for User A and make a join to get Articles back.
Upvotes: 0
Views: 267
Reputation: 7230
It's far better to keep the Writer_id
approach and create an index on that property. If you store an array of references, then you'll need to perform an $in
operation on your find()
calls. This will result in your query "jumping" from one matching Article_id
to another. If instead you have a Writer_id
and an index built for that property, all of user's articles will exist in the same sequential "block" in the index, requiring no jumping around whatsoever. The result is a far more read-efficient find()
operation.
Additionally, the articles array approach would require frequent updates to the user document, whereas the Writer_id
approach only requires inserts. Inserts are incredibly efficient, whereas frequent updates are relatively inefficient. Finally, an array of Article_id
s can potentially (if unlikely) result in hitting the 16 MB document size limit. The Writer_id
approach runs into no such limitation.
The difference should be relatively negligible for a smaller project, but if you're looking for scalability, then you're better off with the Writer_id
approach.
Upvotes: 2