Reputation: 243
We have 27 million documents distributed among 3 shards, each containing approximately 9 million documents. Each document have ~15 indexed fields. The requirement is that we should be able to filter these documents using combinations of the indexed fields. The count() takes less than 20 seconds max for complex queries.
We also need to extract a specific field of those documents that matched the filters using find(). However, sometimes this takes minutes to finish, especially if the result is more than 1 million documents. This is executed via web service call so sometimes a timeout occurs.
I want to know if adding more shards can solve the problem or are there other optimizations that we can apply.
Thanks!
Upvotes: 2
Views: 2809
Reputation: 3893
I had the same issue on a project with tens of millions of records, with complicated filtering queries.
I don't know if you have enough resources or it is possible for your project, but I solved the project be creating new collection which contains the result of the reports.
The system provides and updates reports in idle times and most of the reports are ready to use or need update just for the new fields.
Also as other said, paging is a must for this kind of queries.
If you solve the problem about the query execution and it would be fast enough, the HTTP request for handling that much data is not as fast as enough for a good user experience.
Upvotes: 0
Reputation: 985
Adding more shards won't help you but you can do pagination which can return limited documents for that you have to do multiple API call
you can do like this
db.users.find(/*condition*/).limit (10000)
db.users.find(/*condition*/).skip(10000).limit(10000)
db.users.find(/*condition*/).skip(20000).limit(10000)
Upvotes: 2