HuckleberryFinn
HuckleberryFinn

Reputation: 1529

How to select rows corresponding to a randomly selected column value in SQL

My query returns a result like shown in the table. I would like to randomly pick an ID from the ID column and get all the rows having that ID. How can I do that in SnowFlake or SQL:

ID Postalcode Value ...
1e3d NK25F4 3214 ...
1e3d NK25F4 3258 ...
1e3d NK25F4 3354 ...
1f74 NG2LK8 5524
1f74 NG2LK8 5548
3e9a N6B7H4 3694
3e9a N6B7H4 3325
38e4 N6C7H2 3654 ...

Upvotes: 1

Views: 245

Answers (5)

Rajat
Rajat

Reputation: 5803

You can also play with the window frame a little and let qualify do the work

select *
from your_table
qualify id=first_value(id) over (order by random() rows between unbounded preceding and unbounded following)

Snowflake deviates from ANSI standard on the default window frames for rank-related functions (first_value, last_value, nth_value), so that makes the above equivalent to :

select *
from your_table
qualify id=first_value(id) over (order by random())

Upvotes: 0

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25938

There is a Snowflake function to return a fix number of "random" rows SAMPLE, so using that will reduce the need to read all rows.

SELECT t.* 
FROM your_table as t
JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS)) as r
    ON t.id = r.id

thus using your data above:

with your_table(id, postalcode, value) as (
    select * from values 
        ('1e3d', 'NK25F4', 3214),   
        ('1e3d', 'NK25F4', 3258),   
        ('1e3d', 'NK25F4', 3354),
        ('1f74', 'NG2LK8', 5524),   
        ('1f74', 'NG2LK8', 5548),   
        ('3e9a', 'N6B7H4', 3694),   
        ('3e9a', 'N6B7H4', 3325),   
        ('38e4', 'N6C7H2', 3654)
)

I get (random set) but one looks like:

ID POSTALCODE VALUE
1f74 NG2LK8 5,524
1f74 NG2LK8 5,548

You could also use a NATURAL JOIN like:

SELECT * 
FROM your_table
NATURAL JOIN (SELECT ID FROM your_table SAMPLE (1 ROWS))

Upvotes: 3

Adrian White
Adrian White

Reputation: 1804

enter image description here

WITH DATA AS (
select '1e3d' id,'NK25F4' postalcode,3214 some_value union all   
select '1e3d' id,'NK25F4' postalcode,3258 some_value union all 
select '1e3d' id,'NK25F4' postalcode,3354 some_value union all   
select '1f74' id,'NG2LK8' postalcode,5524 some_value union all  
select '1f74' id,'NG2LK8' postalcode,5548 some_value union all  
select '3e9a' id,'N6B7H4' postalcode,3694 some_value union all  
select '3e9a' id,'N6B7H4' postalcode,3325 some_value union all  
select '38e4' id,'N6C7H2' postalcode,3654 some_value )
SELECT * FROM DATA ,LATERAL (SELECT ID FROM DATA SAMPLE(2 ROWS)) I WHERE I.ID = DATA.ID  

Upvotes: 0

Frank Hopkins
Frank Hopkins

Reputation: 695

Something like

SELECT * 
FROM Table_NAME 
WHERE ID IN (SELECT ID FROM Table_Name ORDER BY RAND() LIMIT 1);

Should work. Though it's not particularly efficient and in many application scenarios it would arguably be more reasonable overall to compute the random ID in your application (e.g. keeping the set of all ids cached, periodically pulling it separately if need be etc).

(Note: The query assumes MYSQL, other variants may have slightly different keywords/structure, e.g. for the random function).

Upvotes: 1

GMB
GMB

Reputation: 222482

You could put your existing query in a common table expression, then pick a random ID from it, and use it to filter the dataset:

with
    dat as ( ... your query ...),
    tid as (select id from dat order by random() fetch first 1 row)
select d.*
from dat d
inner join tid t on t.id = d.id

The second CTE, tid picks the random id; it does that by randomly ordering the dataset, then getting the id of the top row.

Upvotes: 1

Related Questions