p.magalhaes
p.magalhaes

Reputation: 8374

Master Data Management using Graph Database

I am building a master database to store all relevant information about our customers. I am using Neo4j.

Below is a sample of our model. We have Person, that can be registered in 3 of our mobile applications. (App.01, App. 02, App. 03 - We use CPF key, it is like a SSN). In those apps the user can be registered with an email. So it is represented by Email entity. Those user can have multiple address represented by Address entity.

Master data from John user

The question is: As I am building a Master Data, IMO, if someone query the mdm database asking for all "best" information about a person, I would return for example: Name: John Best email: email2 (because it has two apps using it) Best address: addr1 (because it has tow apps using it)

So I am going to build some heuristis to define what is the "best" email and address.

For this purpose, I have some options:

  1. I could create an edge from John to email2 and to addr1. So it's going to be easy for an user of MDM to get the "best" address/email from John.

  2. I could build a rest API endpoint and create this heuristic in query time.

Does anyone have experience using graph database or design MDM database? Is it a good approach?

This question is a complement for the question: Using Neo4j to build a Master Data Management

Upvotes: 1

Views: 786

Answers (2)

Hugo R
Hugo R

Reputation: 3113

The graph data model is good to store your master data, however, your master data most likely will co-exist with operational and reference data in the form of dimensions. if you decide to go with a graph model for your DMD, make sure that you have a well defined semantic model for the core dimension is MDM, usually:

  1. products
  2. customer
  3. employees
  4. Assets
  5. Location

These core dimensions become attributes of your nodes.

Also, decide what DMD architecture style you are going to adopt, some popular ones are:

  1. The Registry - Graph fits very well with this style because your master data remains in the SOS(system of record) and the references can be represented in the graph very nicely.
  2. Master data Hub - Extra transformations ar4e required to transpose your system of record from tabular to the graph.
  3. Master-Master. - this style fits well with your MDM in the graph if you do not have too many legacy apps that depend on your MDM.

Upvotes: 1

cybersam
cybersam

Reputation: 67019

Approach 1 would add a lot of essentially redundant information (about 2N extra relationships, where N is the number of people), and also require more complex coding to handle changes to a person's apps. And, as always when information is stored redundantly, you would have to be especially careful that inconsistencies do not creep in. But, it should be faster when querying for the "best" contact info.

Approach 2 keeps the DB the same size, but requires a more complex and slower query to get the "best" contact info. However, changing a person's apps and contact info is straightforward.

To decide which approach to use, you should consider whether DB size is an issue, and also look at your use cases and how frequently they will be performed.

Here is a simple heuristic if DB size is not an issue. Suppose G is the frequency at which you need to get a person's "best" contact info, and M is the frequency at which you need to modify a person's apps or contact info. You would pick approach 1 if the value of G/M exceeds some threshold value, K, that you would have to decide on, taking into consideration the above considerations.

Upvotes: 0

Related Questions