turbo
turbo

Reputation: 1263

Get count of duplicate rows in DISTINCT ON

In any SELECT query, where a DISTINCT ON is used, how can one additionally get the number of duplicates for each row in the result set?

Take e.g.

SELECT
  DISTINCT ON (building)
  building,
  name
FROM ...
WHERE ...

This will only return the first result for each building. I want to add another column, so the results look like this:

name | building | excluded
Fred | Office   | 0
Bob  | Storage  | 3

when there are more people than Bob in Storage. I'm using Postgres 10.

Upvotes: 1

Views: 2545

Answers (3)

user330315
user330315

Reputation:

You can use a window function:

with data (name, building) as (
  values 
    ('Bob', 'Storage'),
    ('Bob', 'Storage'),
    ('Bob', 'Storage'),
    ('Bob', 'Storage'),
    ('Fred', 'Office'),
    ('Tim', 'Home'),
    ('Tim', 'Home')
)
select distinct on (building) *, 
       count(*) over (partition by building) - 1 as excluded
from data
order by building;

returns:

name | building | excluded
-----+----------+---------
Tim  | Home     |        1
Fred | Office   |        0
Bob  | Storage  |        3

This works because the window function is evaluated before the distinct on ()

However this means doing some work twice. I think it might be faster to re-use the partitioning "work" to also filter out the duplicates:

with ranked as (
  select *, 
         count(*) over w - 1 as excluded, 
         row_number() over w as rn
  from your_table
  window w as (partition by building)
) 
select *
from ranked
where rn = 1;

Upvotes: 7

klin
klin

Reputation: 121604

You can simply use group by instead of distinct on (to avoid window functions):

with data (name, building) as (
  values 
    ('Bob', 'Storage'),
    ('Bob', 'Storage'),
    ('Bob', 'Storage'),
    ('Bob', 'Storage'),
    ('Fred', 'Office'),
    ('Tim', 'Home'),
    ('Tim', 'Home')
)   
select min(name), building, count(*)- 1 as excluded
from data
group by building
order by building;

 min  | building | excluded 
------+----------+----------
 Tim  | Home     |        1
 Fred | Office   |        0
 Bob  | Storage  |        3
(3 rows)

Upvotes: 1

gpeche
gpeche

Reputation: 22514

Use window functions?

select
first_value(name) over (partition by building order by /* your order */) first_name
first_value(building) over (partition by building order by  /* your order */) building,
count(*) over (partition by building order by /* your order */) - 1 as excluded
from (
    select name, building
    from my_source_table
);

Upvotes: -1

Related Questions