Manngo
Manngo

Reputation: 16311

SQL Null: Preferred way of distinguishing between unknown and no value

Suppose I have a simple table such as:

CREATE TABLE authors (
    id INT PRIMARY KEY,
    name VARCHAR(40),
    born DATE,
    died DATE
);

Both the born and died dates are optional, so they can contain NULL.

Presumably authors who haven’t been born yet, won’t make it into the table, so the obvious way to interpret NULL would be that the date is unknown.

However, there is some ambiguity with the died date. A null could imply that the date is unknown, or that the author has yet to die.

What is the preferred way allow for both an unknown date and an event which has not yet occurred?

For what it’s worth, I routinely use PostgreSQL, MySQL/MariaDB and SQL Server, so this question isn’t specific to a particular DBMS.

Upvotes: 1

Views: 91

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269953

This elaborates on Tim's answer. I think the best approach is:

CREATE TABLE authors (
    id INT PRIMARY KEY,
    name VARCHAR(40),
    isDead int not null,  -- well, could be a bit or boolean or tinyint
    bornDate DATE,
    diedDate DATE,
    constraint chk_authors_isDead check (isDead in (0, 1)),  -- valid values
    constraint chk_authors_isDead_diedDate check (isDead = 1 or diedDate is NULL),
    constraint check_authors_bornDate_diedDate check (bornDate <= diedDate)  -- you might want to require that they are actually old enough to have written somthing
);

This code validates the following conditions:

  • The new isDead column only takes on the values of 0 and 1.
  • When DiedDate is not NULL, then isDead is 1.
  • The author is born before they died.

Check constraints are standard SQL and supported by most databases -- but not alas by MySQL.

Upvotes: 1

Manngo
Manngo

Reputation: 16311

A possible solution would be to remove the died column and add the following table:

CREATE TABLE deaths (
    id INT PRIMARY KEY REFERENCES authors(id),
    died DATE
);

In this case, no record could imply that the author has not died, while a NULL in died would imply that the author has died, but the date is unknown.

The data data could then be extracted as follows:

SELECT
    a.id, a.name, a.born,
    CASE WHEN d.id IS NULL THEN 'living' ELSE coalesce(d.died,'unknown') END
FROM authors a LEFT JOIN deaths d ON a.id=d.id;

The d.id column is used to determine whether there is a matching row, otherwise the author is considered living. If there is a matching row, the result is either the died date or a coalesced alternative.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521429

One way to handle this would be to add a new column deceased which indicates whether or not a given author has expired. This frees your died date field from being responsible for whether an author has died. Under this design, if an author has not yet died (deceased is false), then we don't care about the date of death. And if an author has died, then a NULL value would only mean that the date really is unknown.

Upvotes: 1

Related Questions