Repeating same data across collections in Azure DocumentDB

Question

When using DocumentDb for designing a conversation system, is it a good idea to repeat the conversation details for all the parties involved?

I have sharding implemented using the first alphabet of the user name. Now the user A sends a message to F, I and Z. Since these users are part of different collections (because of sharding) the message details are repeated in each collection. This design helps me with fast reading (it will be fast as I need to go to one location only to display the history). But writing could be tedious as I have to write to multiple locations.

So, my question is when building such systems using DocumentDb, can we repeat the details? Or is it a good idea to have a centralized collection for the details and maintaining that id part of each users collection?

Please help.

Thank you, Soma.

Larry Maccherone · Accepted Answer

What you are talking about is akin to the tradeoff between fully normalized and partially denormalized data modeling, although even that's not a perfect fit because of the different collection issue. That said, I think the answer given about denormalization holds in this case, "It depends."

You are thinking in the correct terms by pointing out that you make reads faster.

My advice however, is to not denormalize unless you have evidence from production that fully normalized is not fast enough and from experiments that denormalized is faster. Every denormalization increases the risk of data corruption and it's notoriously tricky to resolve bugs like this. Have you tried storing it in one place? Is that fast enough? Have you done an experiment that makes you think this denormalization is faster?

Also, I have the opposite instinct about performance in this case. If you have to issue two queries and they hit different collections as opposed to one, I would expect your throughput to go up and your latency for the combined pair of operations to go down, assuming you run them in parallel.

Repeating same data across collections in Azure DocumentDB

Answers (2)

Related Questions