Efficient way for mongodb.find() to search through 1 million document?

Question

I have a blog post server that will contains million of articles, and i need to be able to get all Articles written by User A.

What would be the best schema design.

1) Separate both User and Articles documents, and to get user A Articles search in all the million record for the User's id

articles.find({Writer_id: User_A.id})

2) Put a article id reference inside User schema. ex:

 userSchema = {
    name: "name",
    age: "age",
    articles: [ {type:mongoose.Article_id}, {type:mongoose.Article_id} ]
   }

And search for User A and make a join to get Articles back.

B. Fleming · Accepted Answer

It's far better to keep the Writer_id approach and create an index on that property. If you store an array of references, then you'll need to perform an $in operation on your find() calls. This will result in your query "jumping" from one matching Article_id to another. If instead you have a Writer_id and an index built for that property, all of user's articles will exist in the same sequential "block" in the index, requiring no jumping around whatsoever. The result is a far more read-efficient find() operation.

Additionally, the articles array approach would require frequent updates to the user document, whereas the Writer_id approach only requires inserts. Inserts are incredibly efficient, whereas frequent updates are relatively inefficient. Finally, an array of Article_ids can potentially (if unlikely) result in hitting the 16 MB document size limit. The Writer_id approach runs into no such limitation.

The difference should be relatively negligible for a smaller project, but if you're looking for scalability, then you're better off with the Writer_id approach.

Efficient way for mongodb.find() to search through 1 million document?

Answers (1)

Related Questions