Reputation: 2331
I have a table with three columns x, y, z
. I'd like to write a query that, within each PARTITION BY x
, returns the rows containing the first n
distinct values of y
.
Here's a sample for n = 2
-- the first 2 distinct values of y
in the first partition are 1 and 2, and 4 and 5 in the second partition, so all rows with those values of y
are included.
x y z included?
----------------------
1 1 1 true
1 1 2 true
1 2 3 true
1 2 4 true
1 3 5 false
1 3 6 false
2 4 7 true
2 4 8 true
2 5 9 true
2 5 10 true
2 6 11 false
2 6 12 false
There's a related question that deals with selecting n
rows from each partition, but that doesn't deal with the distinct values part.
Upvotes: 0
Views: 456
Reputation: 10976
You can use a combination of desnse_rank
and row_number
to eliminate the duplicates:
with a as (
select
x, y, z,
dense_rank() over (partition by x order by y) rk,
row_number() over (partition by x, y order by z) rn
from
t
) select
x, y, z
from
a
where
rk <= 2 and
rn = 1;
this generates the 1, 2, 4, 5
from this you can get the desired results by joining back onto t:
with a as (
select
x, y, z,
dense_rank() over (partition by x order by y) rk,
row_number() over (partition by x, y order by z) rn
from
t
) select
t.*
from
t
where
exists (
select
'x'
from
a
where
a.y = t.y and
a.rk <= 2 and
a.rn = 1
);
Although, using exists
in this way, makes the duplicates irrelevant, so you can just do:
with a as (
select
x, y, z,
dense_rank() over (partition by x order by y) rk
from
t
) select
t.*
from
t
where
exists (
select
'x'
from
a
where
a.y = t.y and
a.rk <= 2
);
Upvotes: 2
Reputation: 1269823
I"m not sure what you mean by "first". SQL tables represent unordered sets. So, I'll assume you mean "smallest".
You can do this using dense_rank()
:
select t.*
from (select t.*, dense_rank() over (partition by x order by y) as seqnum
from atable t
) t
where seqnum <= 2;
Upvotes: 2