Reputation: 83
let's say we have a collection of users and each user is followed by another user. if I want to find the users that are NOT following me, I need to do something like:
db.users.find({_id: { $nin : followers_ids } } ) ;
if the amount of followers_ids is huge, let's say 100k users, mongodb will start saying the query is too large, plus sending a big amount of data over the network to make the query is not good neither. what are the best practices to accomplish this query without sending all this ids over the network ?.
Upvotes: 1
Views: 2960
Reputation: 3752
I recommend that you limit the number of query Results to Reduce Network Demand. According to the Docs,
MongoDB cursors return results in groups of multiple documents. If you know the number of results you want, you can reduce the demand on network resources by issuing the limit() method.
This is typically used in conjunction with sort operations. For example, if you need only 50 results from your query to the users collection, you would issue the following command:
db.users.find({$nin : followers_ids}).sort( { timestamp : -1 } ).limit(50)
You can then use the cursor to get retrieve more user documents as needed.
Recommendation to Restructure Followers Schema
I would recommend that you restructure your user documents if the followers will grow to a large amount. Currently user schema may be as such:
{
_id: ObjectId("123"),
username: "jobs",
email: "[email protected]",
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
The good thing about the schema is whenever this user does anything all of the users you need to notify is right here inside of the document. The downside is that if you needed to find everyone a user is following you will have to query the entire users collection. Also your user document will become larger and more volatile as the followers grow.
You may want to further normalize your followers. You can keep a collection that matches followee to followers with documents that look like this:
{
_id: ObjectId("123"),//Followee's "_id"
followers: [
ObjectId("12345"),
ObjectId("12375"),
ObjectId("12395"),
]
}
This will keep your user documents slender, but will take an extra query to get the followers. As the "followers" array changes in size, you can enable the userPowerOf2Sizes allocation strategy to reduce fragmentation and moves.
Upvotes: 1