Reputation: 3371
In my Meteor app, users can follow each other. When one user follows another user, multiple operations need to be run on MongoDB, so the follow action itself is not fully atomic. If the server were to crash mid-execution, the vanity count on a user's profile that denotes how many users he or she is following, i.e. "following": 24
might not actually match the quantity that you would look up directly in MongoDB's followers
collection which stores the relationship data.
To mitigate this, let's say every 24 hours, at midnight, I have a cron job scheduled that looks up every user
document in MongoDB and verify the document's integrity against the data from other collections. If the data in the user's document is accurate, then it's a noop
and proceed to the next user document. If any of the counts are off, correct them before continuing to the next user.
My question is: how do you implement this sort of CRON job efficiently? Let's say I have 100,000 registered users on a Meteor app. If I did a Meteor.users.find().fetch()
, I'd be loading 100k user documents into RAM before I even started iteration. The problem would compound as the userbase grows, and this feels like a server crash waiting to happen.
What is the best way to ensure that all user documents are handled, while doing so efficiently, yet not locking up the Meteor server entirely so that it can still be responsive to user requests that come through the web front-end?
Upvotes: 1
Views: 214
Reputation: 8423
Let's say I have 100,000 registered users on a Meteor app. If I did a Meteor.users.find().fetch(), I'd be loading 100k user documents into RAM before I even started iteration.
Two things on that.
First thing, you can reduce the loaded data by using a projection in your find method:
db.collection.find(query, projection)
https://docs.mongodb.com/manual/reference/method/db.collection.find/
This projection contains the names of the fields relevant to you. Mongo will only return cursors, containing documents with those fields. This may reduce the amount of data to a minimum.
An example would be:
Meteor.users.find({},{following:1).fetch()
This query looks up for 'all' users but returns only their documents with the field "following".
A second thing on that.
Calling .find().fetch()
transforms your mongo cursor into an array and you have to iterate this array step by step.
Since you work on your server cli (where you can exhaust mongo's api to the max) you may rather work on the cursor level. It provides a lot of methods to iterate and manipulate the data:
https://docs.mongodb.com/manual/reference/method/js-cursor/
Third thing on that:
No matter, wether you use the cursor oder the array (fetch) solution: You might try to implement a lazy loading mechanism. So you may process only chunks of new documents, instead of the whole queue. Therefore you guarantee, that there is always room (RAM) for other things.
https://en.wikipedia.org/wiki/Lazy_loading
If you have trouble in any of these suggestions, let me know.
Upvotes: 1