Reputation: 357
In our database model we have a Beneficiary entity. A beneficiary can be a physical person or a corporate beneficiary; a phisical beneficiary has a number of attributes such as name, surname, sex, etc.; in addition, a beneficiary (either corporate or physical person) can either be foreign or not; this further distinction translate into different domain values for a "common" set of attributes (for example in Italy, where I live, tax ids may have a different data format than UK's tax ids).
We are now re-engineering our Beneficiary table, since the developer who initially worked on DB analysis & modeling did a (IMO) short-sighted choice. He put the primary key constraint on attribute BeneficiaryName, wich has been used to store either the Corporate name (e.g. "Microsoft Corporation") in case of Corporate beneficiary or the surname (e.g. Smith) for the physical beneficiary. This way we have the (unacceptable) constraint that we CAN'T have more than 1 beneficiary with surname "Smith" (or a corporate named "Smith") in our DB.
My approach for this "re-factoring" would introduce a generalization for the Beneficiary entity; I would
This should address the aforementioned problem of uniqueness on BeneficiaryName. Seems ok so far?
The real problem I have is: how can/should I handle the further complication added by "foreign" attribute in this model? Should I leave Foreign as it tis, i.e. a flag attribute in Beneficiary? If so, how can I handle the need for different attributes' for a conceptually similar piece of information (i.e. zipcode, tax id) withoud duplicating the attributes (zipcode_foreign, zipcode, taxid_foreign, taxid etc.)? Should I really strive to accomodate different domain values into one field?
Any suggestion would be welcome...
Upvotes: 3
Views: 1211
Reputation: 14549
For generalization there are three concepts how you can design them for a database:
(explanation for 1 generalization, 2 subtypes)
1. One table for all
Here you have all attributes from generalization and subtypes in one table.
Pros:
Cons:
PKs/FKs: one PK, no FKs
For your example: Only one table for "Beneficiary".
2. Two tables
Only tables for subtypes. No table for generalization.
Pros:
Cons:
PKs/FKs: every table it's own PK, no FKs
For your example: Two tables for "CorporateBeneficiary" and "PhysicalBeneficiary"
3. Three tables
Have an own table for generalization and for every subtype. (This is what you choosed.)
Pros:
Cons:
PKs/FKs: Ideally you have one PK Sequence used by all tables. (i.e. Oracle Sequence) In the subtype-tables the pk-column is at the same time a fk-column to the generalization table. This may be a bit complicated in DBMS where you do not have something like Oracle Sequences. There it might be necessary to have seperate pk and fk columns in the subtypes and an own PK sequence for every table. (i.e. because sequence gen is a column property)
Which one you choose depends on your requirements. 1.) and 3.) you will see very often where cases for 2.) are very rare and many developers do not know about this design.
Personally, when I do not have other boundary conditions I choose 3, because I find this the cleanest solution. But I also choosed option 1 in the past due to performance reasons. Cannot remember that I've done something like option 2 since university. :p
Upvotes: 0
Reputation: 2533
Accomodating different domain values in one field will replicate the type of problem you are now having with the main beneficiary table - at some point in the future. I would adopt this approach:
Amend your proposed keys in the CorporateBeneficiary/PhysicalBeneficiary table as follows:
Then for the foreign details adopt a similar approach:
Then create views based on left joins to create "virtual tables" - Beneficiary/PB/Foreign detail. These views will be the basis for access to the information.
If you have very large datasets (> 10^7 rows), you may find that my answer to a problem of validating postal codes is of interest. This is a very long response; is not simple; and is massive overkill unless volumes are extremely large.
EDIT
If your data volumes are lower, the approach suggested by @James Anderson, of multiple (for eg address) lines will be perfectly appropriate.
Upvotes: 2
Reputation: 18408
"Clean Beneficiary table, keeping only common data;"
Exactly what there is to do.
"Add a surrogate primary key to Beneficiary table, let's call it BeneficiaryID;"
May be useful, but don't forget that IF there exists a "natural" identifier, then the uniqueness of this should be enforced too.
"Split Beneficiary table, creating two sub-entityes (CorporateBeneficiary & PhysicalBeneficiary"
Yup. Observe that it will be hard to enforce "absolute" data integrity (enforcing at the same time that all NaturalBeneficiaries are Beneficiaries, that all NonNaturalBeneficiaries are Beneficiaries too, and that all Beneficiaries are either Natural or NonNatural Beneficairies).
"discriminated by a flag in master Beneficiary table"
Nope. Wouldn't do that. The flag is redundant, and redundancy adds complexity without adding value. If you want to know whether a Beneficiary is Natural or NonNatural, check the table where that fact is recorded.
"Find (significative) primary keys for CorporateBeneficiary & PhysicalBeneficiary;"
If You introduce a surrogate for Benficiaries in general, you don't need to replicate the natural identifiers in these other tables. That's once again redundancy, adding complexity without adding value.
"The real problem I have is: how can/should I handle the further complication added by "foreign" attribute in this model?""
You could apply the same approach, distinguishing National and ExtraNational (for both Corporate and Physical Benficiaries), and that might be anything from advisable to absolutely required if data integrity is of key importance when it concerns, say, at least the National Benficiaries. For example, legislation might apply that forces you to verify that National SSN numbers or National corporation identifying numbers are "valid" according to the National rules. If such legislation applies, it is likely to be crucial that such rules are checked in and by the DBMS, not just your app. Of course for Non-Nationals, similar checks are typically nor required, or even not possible in general.
If you take such a distinction between National and Non-National into account in your database structure, you will very likely also want to create a view that "unions" the two (National and Non-National) together, and then you will have to "transform" your data to a "unified" "common" format, which will likely be just CHAR (even if you know that, say, for the National PhysicalBeneficiaries, the contents will be their SSN number which you know consists of some fixed number of digits).
If you don't have to take such a distinction between National and Non-National into account in your database structure, then you will be forced to use that same "unified" "common" format in your single table that will be holding the data for both National and Foreign.
Upvotes: 6
Reputation: 27478
You are thinking along the correct lines.
I would suggest you have a separate "address" table to be shared by all beneficiaries. (you are really describing a maildrop which is the same for people and companies!). But the simplest way to handle the i18n problems here is to keep it simple Line1, Line2, Line3, Line4 and country code.
I would not even go far as having a specific zip/post code as these vary from a simple 4 digits, to the British /[A-Z0-9]{2,4} [A-Z0-9]{2,4}/ or even the Irish Republic with err no post codes whatsoever (Official reasoning "Oh we don't need anything like that, we know where you live."). Additionally the conventional position for the postcode varies from country to country (UK on a line on its own at the end, CH before the city name etc. etc.), some domains follow the city,county,state patterns other places don't have states or counties or do not use them in mail addresses.
Upvotes: 1