Reputation: 582
I was wondering if it's possible to use SQL (preferably snowflake) to select up to N records given certain criteria. To illustrate: Lets say I have a table with 1 million records, containing full names and phone numbers.
There's no limits on the amount of phone numbers that can be assigned to X person, but I only want to select up to 10 numbers per person, even if the person has more than 10.
Notice I don't want to select just 10 records, I want the query to return every name in the table, I only want to ignore extra phone numbers when the person already has 10 of them.
Can this be done?
Upvotes: 3
Views: 3660
Reputation: 1321
This query will help your requirement:
select
full_name,
phonenumber
from
(select
full_name,
phonenumber,
ROW_NUMBER() over (partition by phonenumber order by full_name desc) as ROW_NUMBER from sample_tab) a
where
a.row_number between 1 and 10
order by
full_name asc,
phonenumber desc;
using Snowflake Qualify function:
select
full_name,
phonenumber
from
sample_tab qualify row_number() over (partition by phonenumber order by full_name) between 1 and 10
order by
full_name asc ,
phonenumber desc;
Upvotes: 2
Reputation: 222402
You can use window functions to solve this greatest-n-per-group problem:
select t.*
from (
select
t.*,
row_number() over(partition by name order by phone_number) rn
from mytable t
) t
where rn <= 10
Note that you need an ordering column to define what "top 10" actually means. I assumed phone_number
, but you can change that to whatever suits your use case best.
Better yet: as commented by waldente, snowflake has the qualify
syntax, which removes the need for a subquery:
select t.*
from mytable t
qualify row_number() over(partition by name order by phone_number) <= 10
Upvotes: 7