Reputation: 8374
I am building a master database to store all relevant information about our customers. I am using Neo4j.
Below is a sample of our model. We have Person
, that can be registered in 3 of our mobile applications. (App.01
, App. 02
, App. 03
- We use CPF key, it is like a SSN). In those apps the user can be registered with an email. So it is represented by Email
entity. Those user can have multiple address represented by Address
entity.
The question is: As I am building a Master Data, IMO, if someone query the mdm database asking for all "best" information about a person, I would return for example: Name: John Best email: email2 (because it has two apps using it) Best address: addr1 (because it has tow apps using it)
So I am going to build some heuristis to define what is the "best" email and address.
For this purpose, I have some options:
I could create an edge from John
to email2
and to addr1
. So it's going to be easy for an user of MDM to get the "best" address/email from John.
I could build a rest API endpoint and create this heuristic in query time.
Does anyone have experience using graph database or design MDM database? Is it a good approach?
This question is a complement for the question: Using Neo4j to build a Master Data Management
Upvotes: 1
Views: 786
Reputation: 3113
The graph data model is good to store your master data, however, your master data most likely will co-exist with operational and reference data in the form of dimensions. if you decide to go with a graph model for your DMD, make sure that you have a well defined semantic model for the core dimension is MDM, usually:
These core dimensions become attributes of your nodes.
Also, decide what DMD architecture style you are going to adopt, some popular ones are:
Upvotes: 1
Reputation: 67019
Approach 1 would add a lot of essentially redundant information (about 2N extra relationships, where N is the number of people), and also require more complex coding to handle changes to a person's apps. And, as always when information is stored redundantly, you would have to be especially careful that inconsistencies do not creep in. But, it should be faster when querying for the "best" contact info.
Approach 2 keeps the DB the same size, but requires a more complex and slower query to get the "best" contact info. However, changing a person's apps and contact info is straightforward.
To decide which approach to use, you should consider whether DB size is an issue, and also look at your use cases and how frequently they will be performed.
Here is a simple heuristic if DB size is not an issue. Suppose G
is the frequency at which you need to get a person's "best" contact info, and M
is the frequency at which you need to modify a person's apps or contact info. You would pick approach 1 if the value of G/M
exceeds some threshold value, K
, that you would have to decide on, taking into consideration the above considerations.
Upvotes: 0