Jivan
Jivan

Reputation: 23078

Multiple DISTINCT ON clauses in PostgreSQL

Is it possible to select rows that are DISTINCT ON some separate, independent sets of columns?

Suppose I want all the rows which match the following conditions:

So that, out of the following table, the rows marked with a red cross would not be distinct (with an indication of the failing clause):

name      birth    height
--------------------------
William    1976      1.82
James      1981      1.68
Mike       1976      1.68
Tom        1967      1.79
William    1976      1.74   ❌ (name, birth)
William    1981      1.82   ❌ (name, height)
Tom        1978      1.92
Mike       1963      1.68   ❌ (name, height)
Tom        1971      1.86
James      1981      1.77   ❌ (name, birth)
Tom        1971      1.89   ❌ (name, birth)

In the above example, if the DISTINCT ON clause had just been DISTINCT ON (name, birth, height), then all the rows would have been considered distinct.

Tried and didn't work:

Upvotes: 11

Views: 14161

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657397

As commented, there is ambiguity in the question. The number of result rows can differ for every call. If you are satisfied with arbitrary results, @klin's solution is good enough. Else, you need to define requiremens more closely. Like:
distinct on (name, birth), pick smallest height first, then smallest ID as tiebreaker

Or:
distinct on (name, height), pick earliest birth first, then smallest ID as tiebreaker

Your table should have a primary key (or some way to identify rows uniquely):

CREATE TEMP TABLE tbl (
  tbl_id serial PRIMARY KEY
, name text
, birth int
, height numeric);

INSERT INTO tbl (name, birth, height)
VALUES
  ('William', 1976, 1.82)
, ('James',   1981, 1.68)
, ('Mike',    1976, 1.68)
, ('Tom',     1967, 1.79)
, ('William', 1976, 1.74)
, ('William', 1981, 1.82)
, ('Tom',     1978, 1.92)
, ('Mike',    1963, 1.68)
, ('Tom',     1971, 1.86)
, ('James',   1981, 1.77)
, ('Tom',     1971, 1.89);

Query:

SELECT DISTINCT ON (name, height) *
FROM  (
   SELECT DISTINCT ON (name, birth) *
   FROM   tbl
   ORDER  BY name, birth, height, tbl_id  -- pick smallest height, ID as tiebreaker
   ) sub
ORDER  BY name, height, birth, tbl_id;    -- pick earliest birth, ID as tiebreaker
 tbl_id |  name   | birth | height
--------+---------+-------+--------
      2 | James   |  1981 |   1.68
      8 | Mike    |  1963 |   1.68
      4 | Tom     |  1967 |   1.79
      9 | Tom     |  1971 |   1.86
      7 | Tom     |  1978 |   1.92
      5 | William |  1976 |   1.74
      6 | William |  1981 |   1.82
(7 rows)    -- !!!

A DISTINCT ON query without deterministic ORDER BY can return any arbitrary row from each set of dupes. Applied once, you still get a deterministic number of rows (with arbitrary picks). Applied repeatedly, the resulting number of rows is arbitrary, too. Related:

Upvotes: 14

klin
klin

Reputation: 121694

Use a derived table:

with my_table(name, birth, height) as (
values
('William',    1976,      1.82),
('James',      1981,      1.68),
('Mike',       1976,      1.68),
('Tom',        1967,      1.79),
('William',    1976,      1.74),  -- ? (name, birth)
('William',    1981,      1.82),  -- ? (name, height)
('Tom',        1978,      1.92),
('Mike',       1963,      1.68),  -- ? (name, height)
('Tom',        1971,      1.86),
('James',      1981,      1.77),  -- ? (name, birth)
('Tom',        1971,      1.89)   -- ? (name, birth)
)
select distinct on (name, height) *
from (
    select distinct on (name, birth) *
    from my_table
    ) s

  name   | birth | height 
---------+-------+--------
 James   |  1981 |   1.68
 Mike    |  1963 |   1.68
 Tom     |  1967 |   1.79
 Tom     |  1971 |   1.89
 Tom     |  1978 |   1.92
 William |  1976 |   1.82
(6 rows)        

Upvotes: 2

Related Questions