Reputation: 83
For example, the table looks like
a | b | c |
---|---|---|
1 | 1 | 1 |
2 | 1 | 1 |
3 | 1 | 1 |
4 | 1 | 1 |
5 | 1 | 1 |
6 | 1 | 1 |
7 | 1 | 1 |
8 | 1 | 1 |
9 | 1 | 1 |
10 | 1 | 1 |
11 | 1 | 1 |
I want to randomly pick 2 rows from every interval based on column a, where a ~ [0, 2], a ~ [4, 6], a ~ [9-20].
Another more complicated case would be select n rows from every interval based on multiple columns, for example in this case the interval will be a ~ [0, 2], a ~ [4, 6], b ~ [7, 9], ...
Is there a way to do so with just SQL?
Upvotes: 2
Views: 253
Reputation: 59165
Find out to which interval each row belongs, order by random partitioned by an interval id, get the top n rows for each interval:
create transient table mytable as
select seq8() id, random() data
from table(generator(rowcount => 100)) v;
create transient table intervals as
select 0 i_start, 6 i_end, 2 random_rows
union all select 7, 20, 1
union all select 21, 30, 3
union all select 31, 50, 1;
select *
from (
select *
, row_number() over(partition by i_start order by random()) rn
from mytable a
join intervals b
on a.id between b.i_start and b.i_end
)
where rn<=random_rows
Edit: Shorter and cleaner.
select a.*
from mytable a
join intervals b
on a.id between b.i_start and b.i_end
qualify row_number() over(partition by i_start order by random()) <= random_rows
Upvotes: 1
Reputation: 1269443
To get two rows per group, you want to use row_number()
. To define the groups, you can use a lateral join to define the groupings:
select t.*
from (select t.*,
row_number() over (partition by v.grp order by random()) as seqnum
from t cross join lateral
(values (case when a between 0 and 2 then 1
when a between 4 and 6 then 2
when a between 7 and 9 then d
end)
) v(grp)
where grp is not null
) t
where seqnum <= 2;
You can adjust the case
expression to define whatever groups you like.
Upvotes: 0