Reputation: 1529
My query returns a result like shown in the table. I would like to randomly pick an ID from the ID
column and get all the rows having that ID
. How can I do that in SnowFlake or SQL:
ID | Postalcode | Value | ... |
---|---|---|---|
1e3d | NK25F4 | 3214 | ... |
1e3d | NK25F4 | 3258 | ... |
1e3d | NK25F4 | 3354 | ... |
1f74 | NG2LK8 | 5524 | |
1f74 | NG2LK8 | 5548 | |
3e9a | N6B7H4 | 3694 | |
3e9a | N6B7H4 | 3325 | |
38e4 | N6C7H2 | 3654 | ... |
Upvotes: 1
Views: 245
Reputation: 5803
You can also play with the window frame
a little and let qualify
do the work
select *
from your_table
qualify id=first_value(id) over (order by random() rows between unbounded preceding and unbounded following)
Snowflake deviates from ANSI standard on the default window frames for rank-related functions (first_value
, last_value
, nth_value
), so that makes the above equivalent to :
select *
from your_table
qualify id=first_value(id) over (order by random())
Upvotes: 0
Reputation: 25938
There is a Snowflake function to return a fix number of "random" rows SAMPLE, so using that will reduce the need to read all rows.
SELECT t.*
FROM your_table as t
JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS)) as r
ON t.id = r.id
thus using your data above:
with your_table(id, postalcode, value) as (
select * from values
('1e3d', 'NK25F4', 3214),
('1e3d', 'NK25F4', 3258),
('1e3d', 'NK25F4', 3354),
('1f74', 'NG2LK8', 5524),
('1f74', 'NG2LK8', 5548),
('3e9a', 'N6B7H4', 3694),
('3e9a', 'N6B7H4', 3325),
('38e4', 'N6C7H2', 3654)
)
I get (random set) but one looks like:
ID | POSTALCODE | VALUE |
---|---|---|
1f74 | NG2LK8 | 5,524 |
1f74 | NG2LK8 | 5,548 |
You could also use a NATURAL JOIN like:
SELECT *
FROM your_table
NATURAL JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS))
Upvotes: 3
Reputation: 1804
WITH DATA AS (
select '1e3d' id,'NK25F4' postalcode,3214 some_value union all
select '1e3d' id,'NK25F4' postalcode,3258 some_value union all
select '1e3d' id,'NK25F4' postalcode,3354 some_value union all
select '1f74' id,'NG2LK8' postalcode,5524 some_value union all
select '1f74' id,'NG2LK8' postalcode,5548 some_value union all
select '3e9a' id,'N6B7H4' postalcode,3694 some_value union all
select '3e9a' id,'N6B7H4' postalcode,3325 some_value union all
select '38e4' id,'N6C7H2' postalcode,3654 some_value )
SELECT * FROM DATA ,LATERAL (SELECT ID FROM DATA SAMPLE(2 ROWS)) I WHERE I.ID = DATA.ID
Upvotes: 0
Reputation: 695
Something like
SELECT *
FROM Table_NAME
WHERE ID IN (SELECT ID FROM Table_Name ORDER BY RAND() LIMIT 1);
Should work. Though it's not particularly efficient and in many application scenarios it would arguably be more reasonable overall to compute the random ID in your application (e.g. keeping the set of all ids cached, periodically pulling it separately if need be etc).
(Note: The query assumes MYSQL, other variants may have slightly different keywords/structure, e.g. for the random function).
Upvotes: 1
Reputation: 222482
You could put your existing query in a common table expression, then pick a random ID from it, and use it to filter the dataset:
with
dat as ( ... your query ...),
tid as (select id from dat order by random() fetch first 1 row)
select d.*
from dat d
inner join tid t on t.id = d.id
The second CTE, tid
picks the random id; it does that by randomly ordering the dataset, then getting the id of the top row.
Upvotes: 1