Alex
Alex

Reputation: 2780

Grouped random sample MS SQL Server

Lets say that I want to randomly sample a database with id of the following form

id 
A    
A
A
B
C
C
D
D
E
F

Instead of doing a completely random sample how would I sample 50% of the distinct groups. The difference is that a random sample would give me half of all of the records while a grouped sample would give me half of all of the distinct ids.

This is necessary because I would not want to partially sample an id. If an A is selected then all 3 As should be included in this kind of grouped random sample.

Lets say for example the group set {A,B,C} was randomly selected. The query would produce the following

id
A
A
A
B
C
C

Upvotes: 2

Views: 70

Answers (1)

Martin Smith
Martin Smith

Reputation: 453057

You can use

SELECT *
FROM   YourTable
WHERE  [id] IN (SELECT TOP 50 PERCENT [id]
                FROM   YourTable
                GROUP  BY [id]
                ORDER  BY CRYPT_GEN_RANDOM(4)) 

Upvotes: 1

Related Questions