from each partition, select rows with N distinct values

Question

I have a table with three columns x, y, z. I'd like to write a query that, within each PARTITION BY x, returns the rows containing the first n distinct values of y.

Here's a sample for n = 2 -- the first 2 distinct values of y in the first partition are 1 and 2, and 4 and 5 in the second partition, so all rows with those values of y are included.

 x   y   z   included?
----------------------
 1   1   1    true
 1   1   2    true
 1   2   3    true
 1   2   4    true
 1   3   5    false
 1   3   6    false
 2   4   7    true
 2   4   8    true
 2   5   9    true
 2   5  10    true
 2   6  11    false
 2   6  12    false

There's a related question that deals with selecting n rows from each partition, but that doesn't deal with the distinct values part.

Gordon Linoff · Accepted Answer

I"m not sure what you mean by "first". SQL tables represent unordered sets. So, I'll assume you mean "smallest".

You can do this using dense_rank():

select t.*
from (select t.*, dense_rank() over (partition by x order by y) as seqnum
      from atable t
     ) t
where seqnum <= 2;

from each partition, select rows with N distinct values

Answers (2)

Related Questions