user223304
user223304

Reputation: 111

DB Design Question about Nullable Foreign Keys and Normalization

I'm hoping to get a consensus about what db schema is best for my situation to store 'type' information for Widgets in a table. A Widget can have only one type but that type can be either a Preset-Type OR a Custom-Type. I obviously create the preset types and a user would create a custom type.

I'll be using MySQL and INNODB on a server. I'll also be using SQLite to store the same info on an App. But we'll just talk about the server here. I'm an app programmer, not a DB admin, but want to get the DBs right for this project the first time and normalized within reason.

In my searching on whether or not I should use nulls for foreign keys I've come across the following answers from people who have much more DB experience than I do.

I need to know if it's bad practice to use Nulls in the particular case of Model #2 and which model is preferable and why. Or possibly suggest a better model. Thanks for any input.

Model #1

Have one 'types' table for both Preset and Custom Types. I do this by pre-populating the 'types' table with preset types and leaving around 1500 reserved spaces for future Preset-types that I could later add.

Pros: Easy, no extra tables, no joins, probably the fastest option, and prob less db space in the long run (4 byte type_id). And widgets table type_id FK will never be NULL.

Cons: Probably not good normalization practice to mix preset and custom types together since presets don't need some fields like 'account_id', etc. If i ever wanted more than 1500 presets (highly unlikely) I'd need to figure something else out. This model also uses sentinel/placeholder values in the types table for presets and reserved preset spots.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id   INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    type_id     INT UNSIGNED NOT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE,
    FOREIGN KEY (type_id) REFERENCES types(type_id)
    );  
CREATE TABLE types (
    type_id     INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    name        VARCHAR(100) NOT NULL,
    PRIMARY KEY (type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)
    );

Model #2

Separate widget-type table for preset and custom types. 'widgets' table has nullable FK field for preset type and custom type. A Check constraint makes sure one of them is null and the other is not.

Pros: only 1 extra table in the DB. No sentinal/placeholder values except maybe a nulled FK. Don't need to reserve preset value spaces and no limit to future preset type additons.

Cons: Uses one FK null per record in widgets table for either preset_type_id or custom_type_id.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id       INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL, 
    preset_type_id  INT UNSIGNED DEFAULT NULL,
    custom_type_id  INT UNSIGNED DEFAULT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE,
    FOREIGN KEY (preset_type_id) REFERENCES preset_types(preset_type_id),
    FOREIGN KEY (custom_type_id) REFERENCES custom_types(custom_type_id),
    CHECK ((preset_type_id IS NOT NULL AND custom_type_id IS NULL) OR (preset_type_id IS NULL AND custom_type_id IS NOT NULL) )
    );  
CREATE TABLE preset_types (
    preset_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (preset_type_id)
    );  
CREATE TABLE custom_types (
    custom_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (custom_type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)
    );

Model #3

Uses intermediary tables widget_preset_types and widget_custom_types. If a widget has a preset type it will be referenced in the widget_preset_types table, alternatively if the widget has a custom type it will be referenced in the widget_custom_types table.

Pros: Probably the most normalized model. Never uses Nulls or FK Nulls. No sentinal/placehodler values used.

Cons: Adds 3 extra tables in the DB just to determine widget-type. I have other things besides widgets in my DB with custom/preset types which means I could be adding at least 12 extra tables to my DB with this model. Is it over-normalized? I'll have to use some type of join to get all widget info and type info from 3 tables at the same time. I'll have to check whether or not a custom_type_id or preset_type_id comes back in the join probably using more code than I would have used checking for nulls in Model#2. Probably slower than Model 1 & 2. More tables means more indexes means more ram.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id       INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE
    );
CREATE TABLE preset_types (
    preset_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (preset_type_id)
    );  
CREATE TABLE custom_types (
    custom_type_id  INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id      INT UNSIGNED NOT NULL, 
    name            VARCHAR(100) NOT NULL,
    PRIMARY KEY (custom_type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE
    );
CREATE TABLE widget_preset_types (
    widget_id       INT UNSIGNED NOT NULL UNIQUE, 
    preset_type_id  INT UNSIGNED NOT NULL, 
    PRIMARY KEY (widget_id),
    FOREIGN KEY (widget_id) REFERENCES widgets(widget_id) ON DELETE CASCADE,
    FOREIGN KEY (preset_type_id) REFERENCES preset_types(preset_type_id)
    );  
CREATE TABLE widget_custom_types (
    widget_id       INT UNSIGNED NOT NULL UNIQUE, 
    custom_type_id  INT UNSIGNED NOT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (widget_id) REFERENCES widgets(widget_id) ON DELETE CASCADE,
    FOREIGN KEY (custom_type_id) REFERENCES custom_types(custom_type_id)
    );

Upvotes: 1

Views: 1684

Answers (2)

Walter Mitty
Walter Mitty

Reputation: 18940

Some very good designers use NULLs in foreign keys with no adverse consequences. I lean that way myself. A nullable FK represents an optional relationship. In instances where the entity has no relationship, the FK contains a NULL. The space overhead is minimal. When joins (equijoins, more precisely) are done across the two tables, instances containing NULL in the FK will drop out of the join, and that's appropriate.

Having said that, I'm going to recommend a fourth method to you. This involves a total of 4 tables, accounts, widgets, types, and custom_types. The custom_types table uses a technique called Shared-primary-key, outlined below.

CREATE TABLE accounts (
    account_id  INT UNSIGNED AUTO_INCREMENT NOT NULL, 
    # Other Columns...,
    PRIMARY KEY (account_id)
    );  
CREATE TABLE widgets (
    widget_id   INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    type_id     INT UNSIGNED NOT NULL,
    PRIMARY KEY (widget_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id) ON DELETE CASCADE,
    FOREIGN KEY (type_id) REFERENCES types(type_id)
    );  
CREATE TABLE types (
    type_id     INT UNSIGNED AUTO_INCREMENT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    name        VARCHAR(100) NOT NULL,
    PRIMARY KEY (type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)
CREATE TABLE custom_types (
    type_id     INT NOT NULL,
    account_id  INT UNSIGNED NOT NULL, 
    PRIMARY KEY (type_id),
    FOREIGN KEY (type_id) REFERENCES types(type_id),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)

);

The type_id column in custom_types is a shared primary key. Notice that it is declared BOTH as a primary key and as a foreign key, and that it doesn't use autonumber. It's a copy of the primary key in types for the corresponding entry. The custom types table contains all the data that is present in custom types but absent in preset types.

For preset types, an entry is made in types, but no entry is made in custom_types. For custom_types, an entry is first made in types, and then the resulting value of type_id is copied over into custom_types, along with the account_id.

If you INNER JOIN types and custom_types, the preset types drop out of the join. If you want both custom and preset types in a single join, you have to use a LEFT JOIN or a RIGHT JOIN to get that effect. Note that the result of a LEFT or RIGHT JOIN will contain some NULLs, even though those NULLs are not stored in the database.

Clicking on this will give you a more detailed description of the shared primary key technique.

Upvotes: 3

Rick James
Rick James

Reputation: 142366

get the DBs right for this project the first time

That's almost impossible. It is best to plan on revamping the schema a few months into the project.

Nulls vs FKs

First, decide what columns need an out-of-band "null" value. Only then decide whether a FK is usable, or is more hassle than it is worth.

Most columns will always be present, hence NOT NULL. But some can validly be NULL:

  • A value that is not yet known. Example: end_date
  • An optional value.
  • (and lots more)

Normalization

No normalization is usually bad;
Over-normalization is also usually bad;
The middle ground is hard to find until you have some feel for how many widgets there will be, how much their names vary, etc.

There are perhaps two reasons for normalizing:

  • Data Integrity (sort of). To move a commonly occurring thing (typically the 'name' of something or someone) into a single place, thereby making it easy and efficient to change the name when needed. This helps when Upper Volta changes its name to Burkina Faso, but is useless for the split up of Yugoslavia.
  • Performance. If a long name shows up in dozens of places and in lots of indexes, it takes disk space, I/O time and, to a lesser degree, CPU time.

For a thousand "things", normalization does not matter much.
For a billion "things", normalization is vital.
Again, tuning the middle-ground may not be possible on day-one.

Preset vs custom types

You present them as nearly-identical. But then you move on to suggest that 'preset' types are missing some attribute. ("Missing" == "Null"??)

Think of a "type" as an Entity. And your other tables have a Relation to it. It is probably many-to-many.

Do not "reserve 1500". If you need to distinguish preset vs custom, then add a column that says which is which. Reservations will eventually get you in trouble.

The existence of type.account_id (in the first schema) implies that a "type" is not used by multiple accounts. Yet preset types can be? Ugh! Thumbs down to Model 1.

Separate columns for reset and custom. That smacks of "a widget can have one of each". Thumbs down for Model 2.

Model 3 had multiple tables that smell like "many-to-many" mappings between widgets and each type of type. Did you really want many-to-many?

4 byte type_id

Sure, INT is 4 bytes. But do you really expect a billion or more "types"? Use a smaller datatype. For example MEDIUMINT UNSIGNED is 3 bytes and overflows at a generous 16 million. (Etc)

Starting over...

You have 3(?) Entities: Widgets, Accounts, Types. You need one table for each. (Or at least one primary table -- Example: An order is composed of many order_items.)

Those Entities have 1:1 or 1:many or many:many relationships. Decide which applies where.

  • 1:1 is usually 'wrong', so try to avoid that.
  • 1:many has an id for the "1" is in the "many" table.
  • many:many needs an extra table with a pair of ids.

The 3 Entity tables, plus any many:many Relation tables, make up the core of your schema. (Meanwhile, I don't see where "normalization" figures into your small schema.)

Then add on FOREIGN KEYs where desired. (FKs are not mandatory.)

In other words, I think you have the "cart before the horse" when you ask 'one question it would be "I need to know if it's bad practice to use Nulls"'.

Upvotes: 2

Related Questions