Mahdi Hesari
Mahdi Hesari

Reputation: 337

selecting random rows with normal distribution based on a column in SQL Server 2012

FULL DETAILS: let me explain more clear. this is a table including about 100 question. every question has a BooKRange property that shows from which part of the book, this question hast fetched with values 1,2,3,4. and there is another property called Level that shows level of the difficulty of the question with values 1,2,3,4,5. now i need to randomly select 20 question that have to include all four Book Ranges and all five levels with a normal distribution.

please consider that i need to select distinct rows.

thank you very much.

edit: added the table

CREATE TABLE [dbo].[Question] (
    [QuesID]    INT            IDENTITY (1, 1) NOT NULL,
    [BookRange] NVARCHAR (50)  NULL,
    [Level]     NVARCHAR (50)  NULL,
    PRIMARY KEY CLUSTERED ([QuesID] ASC)
);

Upvotes: 2

Views: 1978

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270391

You can do this query (assuming a uniform distribution) without doing a union. You just need to specify the ordering correctly.

If you want to select 5 questions from each of the levels, then you can do so by assigning a sequential number to the questions in each level. If these are assigned randomly, then you should meet the requirement of randomness for the levels:

with q as (
      select q.*,
             row_number() over (partition by [range] order by newid()) as seqnum
      from Question q
     )
select *
from q
where seqnum <= 5;

If you want to ensure that these is exactly one question for each level and range, but want the questions random, then do:

with q as (
      select q.*,
             row_number() over (partition by [range], [level] order by newid()) as seqnum
      from Question q
     )
select *
from q
where seqnum = 1;

By the way, range and level are reserved words in SQL Server. In general, it is good practice to avoid using reserved words for the names of things like tables, columns, stored procedures, and so on.

Upvotes: 1

M.Ali
M.Ali

Reputation: 69554

Since you havent provided any table schema, Assuming we have a table dbo.Number with One column with values from 1 - 30 you could do something like this ...

;With NthGroups
AS
 (
  SELECT  * , NTILE(4) OVER (ORDER BY Nums) Np
  FROM dbo.Number
 ),
Top25Perc
AS
 (
 SELECT TOP 5 * FROM NthGroups 
 WHERE NP = 1
 ORDER BY NEWID()
 UNION ALL 
  SELECT TOP 5 * FROM NthGroups 
  WHERE NP = 2
  ORDER BY NEWID()
 UNION ALL 
  SELECT TOP 5 * FROM NthGroups 
  WHERE NP = 3
  ORDER BY NEWID()
  UNION ALL 
  SELECT TOP 5 * FROM NthGroups 
  WHERE NP = 4 
  ORDER BY NEWID()
 ) 
 SELECT * FROM Top25Perc

Update

Just read your comment in other answer and you have mentioned you have a column Range with values (1,2,3,4) , this makes query even simpler , you can do something like this

;With 
RandTop5
AS
 (
 SELECT TOP 5 * FROM TableName 
 WHERE [Range] = 1
 ORDER BY NEWID()
 UNION ALL 
  SELECT TOP 5 * FROM TableName
  WHERE [Range] = 2
  ORDER BY NEWID()
 UNION ALL 
  SELECT TOP 5 * FROM TableName
  WHERE [Range] = 3
  ORDER BY NEWID()
  UNION ALL 
  SELECT TOP 5 * FROM TableName
  WHERE [Range] = 4
  ORDER BY NEWID() 
 ) 
 SELECT * FROM RandTop5

Upvotes: 1

undefined_variable
undefined_variable

Reputation: 6228

Select distinct id from table where level=1 order by rand() limit 5 union Select distinct id from table where level=2 order by rand() limit 5 union Select distinct id from table where level=3 order by rand() limit 5 union Select distinct id from table where level=4 order by rand() limit 5

Upvotes: 1

Related Questions