user3422637
user3422637

Reputation: 4229

Composite PRIMARY KEY enforces NOT NULL constraints on involved columns

This is one strange, unwanted behavior I encountered in Postgres: When I create a Postgres table with composite primary keys, it enforces NOT NULL constraint on each column of the composite combination.

For example,

CREATE TABLE distributors (m_id integer, x_id integer, PRIMARY KEY(m_id, x_id));

enforces NOT NULL constraint on columns m_id and x_id, which I don't want! MySQL doesn't do this. I think Oracle doesn't do it as well.

I understand that PRIMARY KEY enforces UNIQUE and NOT NULL automatically but that makes sense for single-column primary key. In a multi-column primary key table, the uniqueness is determined by the combination.

Is there any simple way of avoiding this behavior of Postgres? When I execute this:

CREATE TABLE distributors (m_id integer, x_id integer);

I do not get any NOT NULL constraints of course. But I would not have a primary key either.

Upvotes: 52

Views: 34552

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657192

If you need to allow NULL values, use a UNIQUE constraint (or index) instead of a PRIMARY KEY (and add a surrogate PK column - I suggest a serial or IDENTITY column in Postgres 10 or later).

A UNIQUE constraint allows columns to be NULL:

CREATE TABLE distributor (
  distributor_id GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, m_id integer
, x_id integer
, UNIQUE(m_id, x_id)  -- !
-- , CONSTRAINT distributor_my_name_uni UNIQUE (m_id, x_id)  -- verbose form
);

The manual:

For the purpose of a unique constraint, null values are not considered equal, unless NULLS NOT DISTINCT is specified.

In your case, you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint. Postgres does not consider two NULL values equal - as per definition in the SQL standard.

If you need to treat NULL values as equal (i.e. "not distinct") to disallow such "duplicates", I see two three (since Postgres 15) options:

0. NULLS NOT DISTINCT

This option was added with Postgres 15 and allows to treat NULL values as "not distinct", so two of them conflict in a unique constraint or index. This is the most convenient option, going forward. The manual:

That means even in the presence of a unique constraint it is possible to store duplicate rows that contain a null value in at least one of the constrained columns. This behavior can be changed by adding the clause NULLS NOT DISTINCT ...

Detailed instructions:

1. Two partial indexes

In addition to the UNIQUE constraint above:

CREATE UNIQUE INDEX dist_m_uni_idx ON distributor (m_id) WHERE x_id IS NULL;
CREATE UNIQUE INDEX dist_x_uni_idx ON distributor (x_id) WHERE m_id IS NULL;

But this gets out of hands quickly with more than two columns that can be NULL. See:

2. A multi-column UNIQUE index on expressions

Instead of the UNIQUE constraint. We need a free default value that is never present in involved columns, like -1. Add CHECK constraints to disallow it:

CREATE TABLE distributor (
   distributor serial PRIMARY KEY
 , m_id integer
 , x_id integer
 , CHECK (m_id <> -1)
 , CHECK (x_id <> -1)
);
CREATE UNIQUE INDEX distributor_uni_idx
ON distributor (COALESCE(m_id, -1), COALESCE(x_id, -1));

Upvotes: 76

tanius
tanius

Reputation: 16809

In case you want a polymorphic relation

Your table uses column names that indicate that they are probably references to other tables:

CREATE TABLE distributors (m_id integer, x_id integer);

So I think you probably are trying to model a polymorphic relation to other tables – where a record in your table distributors can refer to one m record xor one x record.

Polymorphic relations are difficult in SQL. The best resource I have seen about this topic is "Modeling Polymorphic Associations in a Relational Database". There, four alternative options are presented, and the recommendation for most cases is called "Exclusive Belongs To", which in your case would lead to a table like this:

CREATE TABLE distributors (
  id serial PRIMARY KEY,
  m_id integer REFERENCES m,
  x_id integer REFERENCES x,
  CHECK (
    ((m_id IS NOT NULL)::integer + (x_id IS NOT NULL)::integer) = 1
  )
);

CREATE UNIQUE INDEX ON distributors (m_id) WHERE m_id IS NOT NULL;
CREATE UNIQUE INDEX ON distributors (x_id) WHERE x_id IS NOT NULL;

Like other solutions, this uses a surrogate primary key column because primary keys are enforced to not contain NULL values in the SQL standard.

This solution adds a 4th option to the three in @Erwin Brandstetter's answer for how to avoid the case where "you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint." Here, that case is excluded by a combination of two measures:

  1. Partial unique indexes on each column individually: two records (1, NULL) and (1, NULL) would not violate the constraint on the second column as NULLs are considered distinct, but they would violate the constraint on the first column (two records with value 1).

  2. Check constraint: The missing piece is preventing multiple (NULL, NULL) records, still allowed because NULLs are considered distinct, and anyway because our partial indexes do not cover them to save space and write events. This is achieved by the CHECK constraint, which prevents any (NULL, NULL) records by making sure that exactly one column is NULL.

There's one difference though: all alternatives in @Erwin Brandstetter's answer allow at least one record (NULL, NULL) and any number of records with no NULL value in any column (like (1, 2)). When modeling a polymorphic relation, you want to disallow such records. That is achieved by the check constraint in the solution above.

Upvotes: 1

Related Questions