Dynamic content types: One table with many columns or one table for each?

Question

You have a form editor and can add form fields (inputs). Each form field translates to a different type of data:

A </code> becomes a <code>TEXT</code> column.</li> <li>An <code><input type="text"></code> becomes a <code>VARCHAR(X)</code>.</li> <li>An <code><input type="range"></code> becomes a <code>FLOAT</code>.</li> <li>A <code><select></code> could be a comma-separated list (<code>TEXT</code>), a separate table with each entry (foreign key) or a reference to an existing table (foreign key).</li> </ul> <p>An easy way to store the structure of each:</p> <p><a href="https://i.sstatic.net/BPe5K.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/BPe5K.png" alt="enter image description here" /></a></p> <p>Two solutions to store the data, if someone would fill such a dynamic form:</p> <ul> <li>Create a table where there is a NULLable column for each expected data type (<code>VARCHAR</code>, <code>TEXT</code>, <code>FLOAT</code>, ...). This leaves a lot of ugly <code>NULL</code> values, since each form field value only occupies one column. Also, <code><select></code> fields would be problematic if they refer to an existing table.</li> <li>Create a table for each expected data type. This would keep things cleaner, but require a lot more complexity when storing and fetching data, since the program has to know the type, use the right table, etc.</li> </ul> <p>This problem applies to many other things: a template editor, a settings editor, etc.</p> <p>How do we store the data for such dynamic forms?</p> <p>I wouldn't consider using an additional non-relational database. I want to keep the data in one place.</p>

PerformanceDBA · Accepted Answer

I'm asking about a "general" approach with this problem, not specifically for form editors

This thing keeps popping up again and again and everyone I know seems to have a different take on it - but none of them has a really "proper" solution

After the Relational requirement, which is of course the highest order directive, I will take generic approach and proper solution as over-arching directives.

1 Elevation to Relational

But first, a correction. In order to elevate the data model to Relational, without which the rather straight-forward solution would be exceedingly complex (which is precisely why [a] there are so many incorrect methods, and [b] none of them are proper solutions).

The great bulk of "literature" and textbooks that is promoted and marketed as "relational" is in fact anti-relational. Pre-1970 Record Filing Systems with relational labels, which is evidently all that the many authors and "theoreticians" can understand. The main difference between such pre-relational filing systems and the Relational Model is, the references are logical, not physical.

Whereas in pre-relational systems, physical records were related by physical pointers, such as ID (INTEGER; GUID; UUID; etc), in the Relational paradigm, logical rows (not records) are related by Relational Key.
A Relational Key is required to be made up from the data, which is logical, not physical (physical pointers are not data).

The Record IDs serve no purpose in a Relational database. Their purpose is to propagate what the "theoreticians" understand:

to confuse you
to maintain a pre-1970, pre-relational, physical mindset
and thus prevent a Relational mindset and database

At the least, each such Record ID will be one additional column and one additional index.

Additional, not instead of, because the indices on the logical data (Keys) are required for data Integrity, otherwise you will have duplicates and no integrity.

Therefore, your requirement elevated to Relational, with one less column and one less index in each table:

Katai 1

Note

All my data models are rendered in IDEF1X, the Standard for modelling Relational databases.
- IDEF1X Introduction for beginners
- IDEF1X Model Anatomy for the experienced, who are seeking a refresher

2 Relational Key that is Required

The second thing that needs to be understood is the Intergrity Features of Relational Keys, which you will not have, due to the "theoreticians" being blissfully unaware of it. This is an explanation relevant to the Question, rather than a full tutorial. Assuming that you do want Data Integrity and Relational Integrity (the logical feature provided by the Relational Model (as distinct from Referential Integrity, which is a physical facility provided by SQL). Which we will need, in order to control (provide integrity to) the data structures that are required 'lower' in the data hierarchy, for "Storage".

While the above [1] is correct for that purpose, we have an additional requirement: we need the FieldType in the FormField.PK such that it can be migrated to its child tables.
But of course, we must not subtract from the logical FormField.PK above [1]. If we do, it would lose its integrity and meaning.
So we:
- add the Relational Key that is required [2]
- make it Primary Key because that is what is migrated
- and we maintain the original Primary Key [1] by making it an Alternate Key
- which changes the relation FieldType::FormField from Non-Identifying to Identifying.

Katai 2

3 Relational Solution • Subtype

Now the solution is straightforward.

I see two solutions to this [A] or [B]

Definitely [B], with a couple of improvements to bring it up to Standard.

[A] is an unnormalised mess that has no integrity, and it will be a maintenance nightmare. Nulls are very problematic, Nulls in a Foreign Key is suicide.

but [B] will require a lot more complexity when storing and fetching data, since the program has to know the type, use the right table, etc etc.

Storage

Not at all.

Bear with me, the solution is below
You clearly understand that the data ("storage") tables have to be typed (or cast)
It requires an ordinary Exclusive Subtype structure. (We have had Subtypes in DBMSs since the 1960's.)
I have provided that typing (casting) control in the Basetype
- first, by inserting FieldType in the parent FormField Primary Key, so that it will be migrated
- second, by using a Non-Identifying relation, so that it is not in the Key of the child UserField
- but it remains non-key column, and it is available as a Discriminator for the Basetype
Now, it is possible to understand why [1][ and [2] had to be implemented first
See if you can appreciate the Relational Integrity (as distinct from Referential Integrity) that Relational Keys provide:
- UserField is CONSTRAINED to FieldNames that exist in the applicable FormName
- that level of logical data integrity, which is standard fare in Relational databases, cannot be had in the pre-relational physical (Record ID based) filing systems that the "theoreticians" fraudulently market as "relational"
Subtype Exposition for beginners

Further, you should be using ACID Transactions (a requirement for SQL Compliance since the first release), not INSERT/UPDATE/DELETE directly from the program. That would simplify both the app and the program code, as well as eliminate the otherwise thousands of errors. If implemented to OLTP Standard, it minimises locking and contention, and reduces, if not eliminates, deadlocks.

Retrieval

Not really. It may not be simple, it is definitely not complex, but it is straight-forward.

Use VIEW (this is a classic use of VIEWS)
- for retrieving a known Subtype:
  one View for each Subtype
- for retrieving an unknown Subtype:
  one View (with an UNION) for any Subtype (which will return just one for each Basetype, whichever Subtype it is)
  (This is not a "supertype", the anti-relational concept beloved of the "theoreticians", heaven forbid.)
That will make the SQL coding also straight-forward.

Katai 3

One More

All Data Models in a single PDF.