sat
sat

Reputation: 87

How to get random sample out of a table data that has selective number of variables in a column out of all the variables?

The below query is the one I have used to get random sample of 50,000 rows of a table, by limiting the 50,000 rows only to 7 variables in column2, which are 'zzz', 'xxxx', 'yyyy', 'hhhh', 'ggggg','kkkk','ooo'. But the below query is not working quite good as it is randomly selecting variables amongst the 7 variables and only displaying 2 or 3 variables for the most part. I would like the query to changed in such a way that it shows a 50,000 rows random sample that has all 7 variables in it

 select top 50000 v.column1, v.column2, v.column3, v.column 4

 from (select top 40 percent distinct  ceo.column1, ceo.column2,cm.column 3, cm.column4

  from table1 ceo 

  inner join table2 cm  on cm.column1 = ceo.column1

  where isnumeric(ceo.column1) !=0

  and column2 IN ('zzz', 'xxxx', 'yyyy', 'hhhh', 'ggggg','kkkk','ooo'

  Order by newid()))v

Upvotes: 0

Views: 91

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270401

You can use window functions:

select column1, column2, column3, column4
from (select t.*,
             row_number() over (partition by column2 order by newid()) as seqnum
      from t
      where column2 in ('zzz', 'xxxx', 'yyyy', 'hhhh', 'ggggg', 'kkkk', 'ooo')
     ) t
where seqnum <= 50000;

Your question mentions one table but your code has more than one table -- making the question a bit unclear. However, the same logic would work for a join query in the subquery.

Upvotes: 1

ildanny
ildanny

Reputation: 391

The easiest solution could be an union all

SELECT * from 
(select top 7000 .... where column2 = 'zzz' ORDER BY newID()) A
union all 
SELECT * from
(select top 7000 .... where column2 = 'xxxx'  ORDER BY newID()) B

From a performance perspective, you could also tak in consideration SELECT FROM table1 TABLESAMPLE (7000 ROWS)

Upvotes: 1

Related Questions