Reputation: 586
I need to randomly sample users in a table into 4 equal groups using SQL from a table. For that I did the below:
WITH randomised_users AS (
SELECT *
FROM users_table
ORDER BY RANDOM()
) SELECT *,
ntile(4) OVER(ORDER BY (SELECT 1)) AS tile_nr
FROM randomised_users
Is this approach of sampling correct or is there a chance for bias in the 4 groups created from this?
Upvotes: 0
Views: 1290
Reputation: 5803
What you have looks fine to me. You don't need a subquery BTW. This will do just fine
select *, ntile(4) over (order by random())
Snowflake doesn't guarantee the query will reproduce the same result set even if you provide a random seed so make sure to dump any intermediate result set into a temp table if you plan on re-using it.
Upvotes: 1