Supertype/subtype Notation for ERD

Question

This is more of a notation and 'proper procedure' type of question than anything.

Please see below an image of a few relations in my Enhanced ERD logical model. A patient can be an OUTPATIENT or a RESIDENT, but there are no attributes which are specific to OUTPATIENTS or RESIDENTS. There are relationships which are specific to the subtypes though, as only OUTPATIENTS can be associated with visits and only RESIDENTs can be associated with beds.

I am in the process of converting this to a physical data model. Obviously it makes sense to not have OUTPATIENT or RESIDENT tables and only a PATIENT table which contains a discriminator for the type of patient.

But what is the proper way to model this?
How do I now model the relationships to VISITS and BEDS while still maintaining the constraint that the discriminator must be of a certain value to qualify for those relationships?
Do I just forget about representing this constraint in the physical data model and make sure its implemented in the code when the tables are created?
Or is there a notation for physical data models which represents this type of constraint?

Section of CareCenter schema in Extended ERD

I have done much searching and cannot seem to find anything about this. All of the material I have found talks about creating subtypes for the purpose of isolating attributes specific to a subtype and not relationships specific to a subtype.

Advice or reference to data you have found that I was not able to is greatly appreciated!

(If you are really trying to make sense of my section of EERD it may be helpful to know that PATIENT is a subtype of a PERSON supertype.)

PerformanceDBA · Accepted Answer

1 Modelling & Notation

1.1 ERD

is pre-Relational, 1960’s. It cannot handle Relational Keys, which means it is hopeless for Relational Data Modelling. In the Relational paradigm, the Relational Key (which is composite) is central, therefore the identity of each entity cannot be analysed, or modelled, or defined, in ERD.

There is no definition in ERD for the Relational concepts of Independent/Dependent tables, or Identifying/Non-Identifying relations, as it is meaningless without a Relational Key, which leads to much confusion when extending ERD and attempting to add those. Further, as you have found, it has no notation of Domain/Datatype; Subtype; etc.

ERD never was a Standard. Since it is un-useable, each person who attempts to use it for an SQL implementation has to “extend” ERD, and that results in a million notations, all of which are different and incomplete. And which have to be explained to the reader. Whereas a Standard needs no explanation because it is complete and documented, once.

Technically, ERD is not a model (which implies a mathematical, logical basis). The semantics are primitive and nowhere near complete. In fact, it is hopeless for modelling, period, even for pre-Relational filing systems.

1.2 IDEF1X

is the Standard for Relational Data Modelling, available since the 1980's, a Standard since 1993. As such it is complete, whereas an extended ERD will never be complete, no matter how much you extend it.

The academics and authors of "textbooks" are clueless: as evidenced, they are 50 years behind the industry (definition) and 40 years behind (implementation on SQL platforms). They are stuck in 1960's Record Filing Systems, which is physical, characterised by a RecordID, and they market it as "relational".

Whereas Codd's Relational Model is completely logical, with a mathematical foundation, and provides far more Integrity; Power; and Speed.

To use ERD at all, you have to extend it, using some private notation, as you have done. Instead of moving incrementally and painfully in the direction of IDEF1X, I suggest you just switch to it, and obtain the full benefit. You may find this IDEF1X Introduction useful.

1.3 Logical vs Physical Data Model

There is a lot of nonsense written about the distinction.

The Logical model simply progresses, in iterations, to the point where it is stable, and then it is the Physical, which can be implemented on a specific SQL platform. That is, there is no “convert” process.

In good Data Modelling tools, such as ERwin, it is one file, not two or three, and the Logical vs Physical is simply different views of that one file. Eg. Domain in the Logical is DataType in the Physical. The Physical is of course specific to the target platform, eg. BOOLEAN in one is BIT in another. If you are not using a Data Modelling tool, or using a poor one, sure, you will have separate files and you have to deal with the attendant synchronisation problems.

But what is the proper way to model this? How do I now model the relationships to visits and beds while still maintaining the constraint that the discriminator must be of a certain value to qualify for those relationships?

In this regard, the question is not about Logical vs Physical DM, all aspects re the question are implemented in both.

Yes, it is about notation. There is no notation problem, or difference (Logical vs Physical) in IDEF1X, because it is complete.

Do I just forget about representing this constraint in the physical data model

No, they are drawn in both, they are implemented in the DDL.

and make sure its implemented in the code when the tables are created?

If you use a Data Modelling tool, it squirts out SQL that is specific to the target platform. Otherwise, sure, you have to write your own DDL and make sure it is correct. In any case, the SQL is the same (not counting the difference in SQL flavours).

Caveat. The pretend SQLs (all freeware “sqls” and Oracle) are not SQL compliant, their use of the term is not correct. They cannot implement ordinary SQL features such as Constraints for Subtypes or ACID Transactions; etc.

Or is there a notation for physical data models which represents this type of constraint?

No, there is no difference in the notation in IDFE1X. Your question appears to be due to your extensions to ERD. First, the ERD is not useable for Relational data modelling, and cannot cope with Relational Keys or Subtypes. Second, your extensions, good as they may be, do not have the ordinary Relational notation that IDEF1X has. Again, just switch to IDEF1X.

2 Codd’s Relational Model

As distinct from the variety of primitive nonsense written by the academics and in textbooks, misleadingly marketed as “relational”.

2.1 Subtype

I have done much searching and cannot seem to find anything about this. All of the material I have found talks about creating subtypes for the purpose of isolating attributes specific to a subtype and not relationships specific to a subtype.

There is no problem at all with a Subtype that has no attributes, same as there is no problem at all with a row that has no attributes. Keep in mind that each entity is a Fact (one fact in one place), and the Fact is established by the Relational Key, to which the attributes are quite secondary (Codd’s 3NF properly understood). Thus Resident and OutPatient are discrete Facts, whether each Subtype has attributes or not; whether the Fact exists for supporting a Foreign Key or not, is a separate issue.

Advice or reference to data you have found that I was not able to is greatly appreciated

You may find this Subtype document useful. For examples, go to my profile, and look up any answers that interest you.

_{If you require even further detail, there is a long discourse regarding Subtypes and notation, that I had with the single academic who is trying to cross the great chasm between academia and reality in this field, who recently "found" IDEF1X from my data models. I use a corrected form of IDEF1X (it was written by an academic), using the pre-existing IEEE notation when it is more precise. The discourse goes into the whys and wherefores of the original IDEF1X vs the corrected form. It is long at 70 posts, and there is a document that summarises it. Just ask.}

Obviously it makes sense to not have OUTPATIENT or RESIDENT tables and only a PATIENT table which contains a discriminator for the type of patient.

No. Each Subtype is a separate table, in the Logical models (first) and Physical (last), and the DDL. The physical is merely the implementation level of the Logical, you should not have anything in the Physical that is not in the Logical (you do not want to implement a thing that is not logical, not semantic; not Relational (which is absolutely logical, and unlimited).

Consider that the database may be expanded in the future, and you may have attributes in the Subtypes. - If the cluster is Exclusive, the Basetype table must have a Discriminator. - If it is Non-Exclusive, there is no Discriminator.
Supertype means something quite different, the academics use terms loosely and incorrectly. Eg. the notion of Superkey is hysterical, and anti-Relational.

2.2 Data Model

Here is the logical model in IDEF1X notation, showing attributes, not domains.

I have corrected a few errors: given the level of modelling that you have demonstrated, I don't think they need a full explanation.

Person Subtype is Non-Exclusive (no Discriminator)
Patient Subtype is Exclusive (needs a Discriminator)
That is to be used in your code to determine the Subtype, otherwise JOIN to the Subtype
Since Resident::Bed is 1::1, the attributes (Bed FK) can be located in Resident. This treatment ensures that the Bed that a Patient may be assigned to, exists.
Consider:
- When an OutPatient visits the CareCenter, is not the purpose to obtain a treatment of some kind, which must be recorded ?
- Is not the treatment obtained under a Physician’s control, and shouldn’t the treatment details be recorded ?
Therefore an OutPatient obtains a Treatment, same as a Resident, and it is common, in the Basetype.
- Visit can be eliminated (again, whether the treatment is received by a Resident or OutPatient regards the Subtype).

The data model in a PDF.

2.3 Predicate

The Predicates can be read directly from the graphic model, the evaluation of such provides an excellent feedback loop to the modelling process. Please read them and verify.

Eg. the Predicate Each Bed accommodates 0-to-n Residents would cause a brawl that can be avoided.

Again, the academics and authors do not understand the Relational Model, and thus they are clueless about Predicates. For a good introduction, refer to Relational Table Naming Convention, the Relationship, Verb Phrase section at the top, and the Predicate section at the end.

2.4 Null

Nulls in a Relational database are a clear indication of a Normalisation error. I have removed them.

3 Outstanding

_{The academics and authors understand only 1960's physical Record Filing Systems (placed in an SQL container for convenience), thus they understand only Referential Integrity. They do not understand Codd's Relational Model, thus they cannot understand, and they cannot teach, Relational Integrity, which is logical, and provides far more data integrity than 50-year-obsolete filing systems.}

Your model allows any Physician to treat any Patient, which is typical for a RFS, if you follow the literature, but sub-normal for Relational.
- I doubt that that is what you want in a database. I think you want only the treating Physician, the ProviderNo to treat the Patient.
As the model progresses, you may wish to ensure that a Bed is assigned to one Resident only. I didn’t model it because I need to be told: is admission and bed assignment two administrative steps or one ?
Do you not require lookup tables for Speciality and TreatmentName ?
Data Modelling is an iterative exercise: it is only when a model is erected, and contemplated, that the issues are exposed, which leads to the next iteration.

Supertype/subtype Notation for ERD

Section of CareCenter schema in Extended ERD

Answers (1)

1 Modelling & Notation

2 Codd’s Relational Model

3 Outstanding

Related Questions