rcrogers
rcrogers

Reputation: 2331

from each partition, select rows with N distinct values

I have a table with three columns x, y, z. I'd like to write a query that, within each PARTITION BY x, returns the rows containing the first n distinct values of y.

Here's a sample for n = 2 -- the first 2 distinct values of y in the first partition are 1 and 2, and 4 and 5 in the second partition, so all rows with those values of y are included.

 x   y   z   included?
----------------------
 1   1   1    true
 1   1   2    true
 1   2   3    true
 1   2   4    true
 1   3   5    false
 1   3   6    false
 2   4   7    true
 2   4   8    true
 2   5   9    true
 2   5  10    true
 2   6  11    false
 2   6  12    false

There's a related question that deals with selecting n rows from each partition, but that doesn't deal with the distinct values part.

Upvotes: 0

Views: 456

Answers (2)

Laurence
Laurence

Reputation: 10976

You can use a combination of desnse_rank and row_number to eliminate the duplicates:

with a as (
  select
    x, y, z,
    dense_rank() over (partition by x order by y) rk,
    row_number() over (partition by x, y order by z) rn
  from
    t
) select
  x, y, z
from
  a
where
  rk <= 2 and
  rn = 1;

this generates the 1, 2, 4, 5

from this you can get the desired results by joining back onto t:

with a as (
  select
    x, y, z,
    dense_rank() over (partition by x order by y) rk,
    row_number() over (partition by x, y order by z) rn
  from
    t
) select
  t.*
from
  t
where
  exists (
    select
      'x'
    from
      a
    where
      a.y = t.y and
      a.rk <= 2 and
      a.rn = 1
  );

Example SQLFiddle

Although, using exists in this way, makes the duplicates irrelevant, so you can just do:

with a as (
  select
    x, y, z,
    dense_rank() over (partition by x order by y) rk
  from
    t
) select
  t.*
from
  t
where
  exists (
    select
      'x'
    from
      a
    where
      a.y = t.y and
      a.rk <= 2
  );

Example SQLFiddle

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269823

I"m not sure what you mean by "first". SQL tables represent unordered sets. So, I'll assume you mean "smallest".

You can do this using dense_rank():

select t.*
from (select t.*, dense_rank() over (partition by x order by y) as seqnum
      from atable t
     ) t
where seqnum <= 2;

Upvotes: 2

Related Questions