Multiple DISTINCT ON clauses in PostgreSQL

Question

Is it possible to select rows that are DISTINCT ON some separate, independent sets of columns?

Suppose I want all the rows which match the following conditions:

distinct on (name, birth)
distinct on (name, height)

So that, out of the following table, the rows marked with a red cross would not be distinct (with an indication of the failing clause):

name      birth    height
--------------------------
William    1976      1.82
James      1981      1.68
Mike       1976      1.68
Tom        1967      1.79
William    1976      1.74   ❌ (name, birth)
William    1981      1.82   ❌ (name, height)
Tom        1978      1.92
Mike       1963      1.68   ❌ (name, height)
Tom        1971      1.86
James      1981      1.77   ❌ (name, birth)
Tom        1971      1.89   ❌ (name, birth)

In the above example, if the DISTINCT ON clause had just been DISTINCT ON (name, birth, height), then all the rows would have been considered distinct.

Tried and didn't work:

SELECT DISTINCT ON (name, birth) (name, height) ...
SELECT DISTINCT ON (name, birth), (name, height) ...
SELECT DISTINCT ON ((name, birth), (name, height)) ...
SELECT DISTINCT ON (name, birth) AND (name, height) ...
SELECT DISTINCT ON (name, birth) AND ON (name, height) ...
SELECT DISTINCT ON (name, birth) DISTINCT ON (name, height) ...
SELECT DISTINCT ON (name, birth), DISTINCT ON (name, height) ...

Erwin Brandstetter · Accepted Answer

As commented, there is ambiguity in the question. The number of result rows can differ for every call. If you are satisfied with arbitrary results, @klin's solution is good enough. Else, you need to define requiremens more closely. Like:
distinct on (name, birth), pick smallest height first, then smallest ID as tiebreaker

Or:
distinct on (name, height), pick earliest birth first, then smallest ID as tiebreaker

Your table should have a primary key (or some way to identify rows uniquely):

CREATE TEMP TABLE tbl (
  tbl_id serial PRIMARY KEY
, name text
, birth int
, height numeric);

INSERT INTO tbl (name, birth, height)
VALUES
  ('William', 1976, 1.82)
, ('James',   1981, 1.68)
, ('Mike',    1976, 1.68)
, ('Tom',     1967, 1.79)
, ('William', 1976, 1.74)
, ('William', 1981, 1.82)
, ('Tom',     1978, 1.92)
, ('Mike',    1963, 1.68)
, ('Tom',     1971, 1.86)
, ('James',   1981, 1.77)
, ('Tom',     1971, 1.89);

Query:

SELECT DISTINCT ON (name, height) *
FROM  (
   SELECT DISTINCT ON (name, birth) *
   FROM   tbl
   ORDER  BY name, birth, height, tbl_id  -- pick smallest height, ID as tiebreaker
   ) sub
ORDER  BY name, height, birth, tbl_id;    -- pick earliest birth, ID as tiebreaker

 tbl_id |  name   | birth | height
--------+---------+-------+--------
      2 | James   |  1981 |   1.68
      8 | Mike    |  1963 |   1.68
      4 | Tom     |  1967 |   1.79
      9 | Tom     |  1971 |   1.86
      7 | Tom     |  1978 |   1.92
      5 | William |  1976 |   1.74
      6 | William |  1981 |   1.82
(7 rows)    -- !!!

A DISTINCT ON query without deterministic ORDER BY can return any arbitrary row from each set of dupes. Applied once, you still get a deterministic number of rows (with arbitrary picks). Applied repeatedly, the resulting number of rows is arbitrary, too. Related:

Select first row in each GROUP BY group?

Multiple DISTINCT ON clauses in PostgreSQL

Answers (2)

Related Questions