Reputation: 472
I have many different mongoose schemas referencing each other by id strings.
I'm using redis to cache mongoose documents.
For instance, getUser(id) will return a previously cached user object if exists, otherwise it'll call mongoose find.
It'd feel more aesthetic to instead have mongoose references and use populate.
However, from what I understand, it's just syntactic sugar for find and doesn't have any caching layer.
Main Question
When should mongoose populate be used vs a caching layer, and what're the best practices in stable high traffic apps using mongoose?
Guiding Sub-Questions
Example Use Case
Here's a common plain example from my app.
I have 3 collections: User, App, Institute.
Right now I'm:
Given a user, fetching app and institute from the caching layer is practically O(1).
However, if I choose to do pure mongoose populate, it'll take 2 extra find calls to the database - for app, and then for institute.
I need the user with app and institute populated on each authenticated request to the server.
Of course there are more complex use cases, but this is the most common one.
My simplest requests require populating 4 references on average, while the more complex ones can get to populating many more.
Upvotes: 1
Views: 1554
Reputation: 1066
Here is my understanding of some of the pros and cons of the twos.
Pros for populate of mongoose
Cons for populate of mongoose
Pros for caching layer
Cons for caching layer
Overall to answer your subquestions, 1. Populate may be useful in some hight traffic app for something that can't be cache and needs to be live or that is done not really often.
Using populate over caching is simpler, less infra, less code, no synchronization.
In my experience, I would go for caching because it will be quicker on a big database. When scaling the database tends to require more cpu and cost more money. Caching on the other hands is cheaper and scales wells. Also, it is possible to cache per instance. i.e. My server has a local cache before hitting the remote cache. This makes the performance very quick but it may affect the server performance depending on the hosting.
I am not in a big company but our product requires transactional information and a fixed state. Populate could be used for this case because the database is the only source of truth and we don't want to have an incorrect state. Due to the replication of our database, it is not a single source but at least we would be close to the database. Everywhere else we use caching. We have multiple databases and multiple databases type and caching gives us more performance. Our micro-service oriented architecture also benefits a lot from caching and ensures that the data is not all in the same database but is still fast to access.
Yes, mixing is a good option depending on the use case. A general tip will be to understand the potential hot spot and to try to spread the workload around to ensure one part of the infrastructure is not the bottleneck.
Final tip: In doubt make sure to keep code interface between the data layer and the code layer. This abstraction is very useful if ElasticSearch needs to be used instead of Redis or any other caching service. Code interface will postpone the need to make a commitment.
Example: Instead of using App.populate
directly in a piece of code adding a method getFullApp()
in your schema that calls this.populate()
const AppSchema = new mongoose.Schema({...});
AppSchema.static({
getFullApp(query) {
return this.find(query).populate()
}
})
module.exports = mongoose.model("App", AppSchema);
If you want to get rid of the populate there is only one place to change it or get rid of mongoose getFullApp
is function of your code interface.
Upvotes: 3