Reputation: 1503

Maintaining consistency of mongodb data

What is the best practices, or tradeoffs, or effectiveness, of the two options below for maintaining consistency of data in MongoDB?

Manual caching with cron jobs (aka storing redundant data and using a script to periodically propagate changes)
Dynamically load data every time but have a cache layer (or utilize the built in mongodb cache)

For example, let's say there are comments and users. With option 1, each comment would contain:

{
    user_id:
    user_displayname:
    user_gravatar:
    [comment fields]
}

If the user decided to change his or her displayname, the user object would change but also a script would run the required MongoDB commands to update all the user's comments to reflect the change.

With option 2, each comment would contain:

{ 
    user_id:
    [comment fields]
}

If the user decided to change his or her displayname, it would only be changed in the user object itself. When a comment is accessed without hitting the cache, it'll associate the user object with the comment object in the cache. That way in the future, if this comment is accessed again while it is still in the cache, both user and comment queries are skipped. (am I basically describing the built in MongoDB cache?)

Is it worth doing the data redundancy described in option 1 at all? or is MongoDB smart enough that additional but equivalent queries are already cached? or is it worth using something else such as Redis to make a cache layer myself?

Thanks!

Upvotes: 1

Answers (2)

Chris Winslett

Reputation: 836

If you are talking about a caching mechanism for 100s of GB of data, you are talking about a serious trade off. Anything less than 5 GB of data, the tradeoffs do not matter. Between 100GB and 5GB, there is a grey area.

The worst case scenario for your data is this:

200 GB of data. 4,000 reads per second. A user with 9,000 comments changes his / her name. Your application also indexes comments on this name value. Your application must then update 9,000 comments and 9,000 index keys. This will cause serious drag in your application for a while.

Then, we must also pose the question for something as simple as names on comments: "Do you have to update the names on old comments?"

When you follow a new person on Twitter, your past timeline does not inherit the person's past tweets. Only your new timeline. Same with comments, why should you update the person's name on past comments?

So, I would add a #3 to your list: "Do not update users' names"

Upvotes: 1

drmirror

Reputation: 3760

There is no "cache" in MongoDB itself. MongoDB uses memory-mapped files, and its performance depends very much on whether it can keep the most frequently used documents, your application's "working set", mapped in main memory rather than having to page each document in from disk prior to accessing it.

You are describing a denormalized database design, where each document contains attributes that would not be there in a normalized form. This can make sense, and it is in fact a very common technique with MongoDB, if it allows you to fetch all the data you need in a single operation, rather than having to do multiple queries.

The downside, as you point out, is that it requires more expensive updates, since you need to update all the documents into which a particular attribute has been denormalized. The downside is also that if your documents are larger, it may be more difficult to keep the working set in memory.

The answer therefore depends on your data access patterns. Generally, if your application is read-heavy, and it tends to need all of these denormalized attributes together, then the denormalizing approach is a good choice. If the application is write-heavy, and especially if it makes frequent updates to those particular attributes, then denormalization is not a good choice.

Upvotes: 1

Maintaining consistency of mongodb data

Answers (2)

Related Questions