Royi Bernthal
Royi Bernthal

Reputation: 472

Mongoose Populate Cache

I have many different mongoose schemas referencing each other by id strings.

I'm using redis to cache mongoose documents.

For instance, getUser(id) will return a previously cached user object if exists, otherwise it'll call mongoose find.

It'd feel more aesthetic to instead have mongoose references and use populate.

However, from what I understand, it's just syntactic sugar for find and doesn't have any caching layer.

Main Question

When should mongoose populate be used vs a caching layer, and what're the best practices in stable high traffic apps using mongoose?

Guiding Sub-Questions

  1. Is mongoose populate really fit for high traffic apps?
  2. Is there any benefit to using populate over caching documents myself?
  3. Is caching models myself (e.g. using redis) negligible performance-wise?
  4. What's the best practice? What do big app companies that use mongoose do?
  5. Would you mix populating mongoose references and a caching layer depending on different use cases or would you choose one and be consistent with it?

Example Use Case

Here's a common plain example from my app.

I have 3 collections: User, App, Institute.

  1. User has a ref to App
  2. App has a ref to Institute

Right now I'm:

  1. Fetching User from caching layer, which contains an app_id
  2. Fetching app from caching layer, which contains institute_id
  3. Fetching institute from the caching layer

Given a user, fetching app and institute from the caching layer is practically O(1).

However, if I choose to do pure mongoose populate, it'll take 2 extra find calls to the database - for app, and then for institute.

I need the user with app and institute populated on each authenticated request to the server.

Of course there are more complex use cases, but this is the most common one.

My simplest requests require populating 4 references on average, while the more complex ones can get to populating many more.

Upvotes: 1

Views: 1554

Answers (1)

Plpicard
Plpicard

Reputation: 1066

Here is my understanding of some of the pros and cons of the twos.

Pros for populate of mongoose

  • No additional setup for cache (simpler infrastructure)
  • It can Deep populate (populate in multiple levels)
  • It can Populate from multiple databases.
  • It is a simple clean syntax
  • No synchronization required between caching and database because it is a "single" source of truth.

Cons for populate of mongoose

  • The database is working for every populate and query instead of your server or a caching layer. If there is a lot of writing on the same instance this will affect the performance of some write if the index needs to be recomputed or a processor-intensive query is made.
  • Rely on the inner working of mongoose and of a MongoDB database.
  • Needs control because Deep populate can get out of hand with multiple levels.

Pros for caching layer

  • Can be multiple levels of cache. Some per server and a global caching.
  • Use the specific force of a caching engine.
  • Offload some work to the cache and potentially to the database.

Cons for caching layer

  • Need to sync state between the cache and the database
  • More infrastructure.
  • More code (if you want a clean abstraction)

Overall to answer your subquestions, 1. Populate may be useful in some hight traffic app for something that can't be cache and needs to be live or that is done not really often.

  1. Using populate over caching is simpler, less infra, less code, no synchronization.

  2. In my experience, I would go for caching because it will be quicker on a big database. When scaling the database tends to require more cpu and cost more money. Caching on the other hands is cheaper and scales wells. Also, it is possible to cache per instance. i.e. My server has a local cache before hitting the remote cache. This makes the performance very quick but it may affect the server performance depending on the hosting.

  3. I am not in a big company but our product requires transactional information and a fixed state. Populate could be used for this case because the database is the only source of truth and we don't want to have an incorrect state. Due to the replication of our database, it is not a single source but at least we would be close to the database. Everywhere else we use caching. We have multiple databases and multiple databases type and caching gives us more performance. Our micro-service oriented architecture also benefits a lot from caching and ensures that the data is not all in the same database but is still fast to access.

  4. Yes, mixing is a good option depending on the use case. A general tip will be to understand the potential hot spot and to try to spread the workload around to ensure one part of the infrastructure is not the bottleneck.

Final tip: In doubt make sure to keep code interface between the data layer and the code layer. This abstraction is very useful if ElasticSearch needs to be used instead of Redis or any other caching service. Code interface will postpone the need to make a commitment.

Example: Instead of using App.populate directly in a piece of code adding a method getFullApp() in your schema that calls this.populate()


const AppSchema = new mongoose.Schema({...});

AppSchema.static({
   getFullApp(query) {
      return this.find(query).populate()
   }
})

module.exports = mongoose.model("App", AppSchema);

If you want to get rid of the populate there is only one place to change it or get rid of mongoose getFullApp is function of your code interface.

Upvotes: 3

Related Questions