Soma Yarlagadda
Soma Yarlagadda

Reputation: 2943

Repeating same data across collections in Azure DocumentDB

When using DocumentDb for designing a conversation system, is it a good idea to repeat the conversation details for all the parties involved?

I have sharding implemented using the first alphabet of the user name. Now the user A sends a message to F, I and Z. Since these users are part of different collections (because of sharding) the message details are repeated in each collection. This design helps me with fast reading (it will be fast as I need to go to one location only to display the history). But writing could be tedious as I have to write to multiple locations.

So, my question is when building such systems using DocumentDb, can we repeat the details? Or is it a good idea to have a centralized collection for the details and maintaining that id part of each users collection?

Please help.

Thank you, Soma.

Upvotes: 0

Views: 53

Answers (2)

Larry Maccherone
Larry Maccherone

Reputation: 9523

What you are talking about is akin to the tradeoff between fully normalized and partially denormalized data modeling, although even that's not a perfect fit because of the different collection issue. That said, I think the answer given about denormalization holds in this case, "It depends."

You are thinking in the correct terms by pointing out that you make reads faster.

My advice however, is to not denormalize unless you have evidence from production that fully normalized is not fast enough and from experiments that denormalized is faster. Every denormalization increases the risk of data corruption and it's notoriously tricky to resolve bugs like this. Have you tried storing it in one place? Is that fast enough? Have you done an experiment that makes you think this denormalization is faster?

Also, I have the opposite instinct about performance in this case. If you have to issue two queries and they hit different collections as opposed to one, I would expect your throughput to go up and your latency for the combined pair of operations to go down, assuming you run them in parallel.

Upvotes: 2

hsulriksen
hsulriksen

Reputation: 592

It seems to me you're partitioning on user, could this be a suitable case for partitioning on an Id by conversation instead? You could then keep track of conversations on the userId. Will require an extra call to get the conversation Id, but once you have it you should be all set.

Upvotes: 0

Related Questions