AntoineWDG
AntoineWDG

Reputation: 549

How to properly handle denormalized data synchronization in Doctrine MongoDB ODM

Denormalization of referenced data seems a pretty common practice when using MongoDB. Yet, I do not see any built-in way to handle that with Doctrine MongoDB ODM.

Let's say I have a social network where users can follow each other, here are two example users :

{
  _id : id1,
  name: "Malcolm Reynolds",
  followed: []
}

{
  _id : id2,
  name: "Zoe Alleyne",
  followed: [
    { _id: id1, name: "Malcolm Reynolds" }
  ]
}

As you can see I want the 'name' property to be denormalized. As I said there seems to be no built-in way in Doctrine ODM to do that. Since the latest issue on the subject is a year old, I would try to do it myself.

Whereas I found plenty of articles on the internet explaining in what case denormalization is useful and mentioning how it can be a pain to keep denormalized data consistent, I didn't find one explaining how implement the dernomalized data update process.

In my case eventually consistent data is enough, there can be few hours between the update of the user name and the update on the denormalized data. I can see 3 different ways to do it :

1 - Consistency checker: Have a task running in background that regularly updates denormalized data.

2 - Update trigger: Each update on a name field updates all of the associated denormalized data in a single flush.

3 - Hybrid solution For each user, when the name is updated, an entry is added to a queue with the updating info (the user update and the insert in the queue would be made in a single flush), and have a task running in background to do the actual updates.

The first solution seems to be the easiest to implement, but as I see it it may be very resource consuming. The second solution would make the update requests ridiculously long, even with a high read/write ratio I this might be a problem. I think the third solution is the way to go, am I right to think so ?

Also I would like to do it in a DRY way, e.g. I would like not to have to rewrite the same code in a preUpdate callback for each Document whose data is denormalized. I am thinking about making a custom annotation, is it a good idea ?

Upvotes: 25

Views: 961

Answers (1)

György Gulyás
György Gulyás

Reputation: 1588

Your 2nd and 3rd solution has one a problem: When the service is crashing the mechanism is never done, you have inconsistent data is your database.

We had a same problem few year ago, and we implemented an event mechanism, (Kafka, but any other event mechanism is fine).

The user repository/service who detect the user is changed throws an event UserChanged, or you can specify the event line UserNameChanges.

All other service is subscribed to the UserNameChanges topic, and make a changes on his data in background.

Pros:

  • Eventually consistent mechanism
  • Just run when is necessary
  • Because of Kafka event store it is always runs, even the service crashing.

Pros:

  • You need to implement the Kafka in your cluster. But it opens up a whole new world of opportunities for you

Upvotes: 0

Related Questions