Manikandan
Manikandan

Reputation: 894

Get random data from SQL Server but no repeated values

I need to get 10 random rows from table at each time, but rows shall never repeat when I repeat the query.

But if I get all rows it will repeat again from one, like table has 20 rows, at first time I get 10 random rows, 2nd time I will need to get remaining 10 rows and at my 3rd query I need to get 10 rows randomly.

Currently my query for getting 10 rows randomly:

SELECT TOP 10 * 
FROM tablename
ORDER BY NEWID() 

But MSDN suggest this query

SELECT TOp 10 * FROM Table1
  WHERE (ABS(CAST(
  (BINARY_CHECKSUM(*) *
  RAND()) as int)) % 100) < 10 

For good performance. But this query not return constant rows. Could you please suggest something on this

Upvotes: 3

Views: 3902

Answers (4)

Twinkles
Twinkles

Reputation: 1994

The most elegant solution, provided you do the consecutive queries within a certain amount of time, would be to use a cursor:

DECLARE rnd_cursor CURSOR FOR 
  SELECT col1, col2, ... 
  FROM tablename
  ORDER BY NEWID(); 

OPEN rnd_cursor;  
FETCH NEXT FROM rnd_cursor;  -- Repeat ten times

Keep the cursor open and just keep fetching rows as you need them. Close the cursor when you're done:

CLOSE rnd_cursor;  
DEALLOCATE rnd_cursor;  

As for the second part of your question, once you fetched the last row, open a new cursor:

IF @@FETCH_STATUS <> 0
BEGIN
  CLOSE rnd_cursor;
  OPEN rnd_cursor;
END;

Upvotes: 0

Cato
Cato

Reputation: 3701

what I would do is have two new fields, SELECTED (int) and TimesSelected (integer) then

UPDATE tablename SET SELECTED = 0;

WITH CTE AS (SELECT TOP 10 * 
FROM tablename
ORDER BY TimesSelected  ASC, NEWID())
UPDATE CTE SET SELECTED = 1,  TimesSelected  = TimesSelected  + 1;

SELECT * from  tablename  WHERE SELECTED = 1; 

so if you use that each time, once selected a record goes to the top of the pile, and records below it are selected randomly.

you might want to put an index on SELECTED and do

UPDATE tablename SET SELECTED = 0 WHERE SELECTED = 1;   -- for performance

Upvotes: 0

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

As Jiri Tousek said, each query that you run has to know what previous queries returned.

Instead of inserting the IDs of previously returned rows in a table and then checking that new result is not in that table yet, I'd simply add a column to the table with the random number that would define a new random order of rows.

You populate this column with random numbers once.

This will remember the random order of rows and make it stable, so all you need to remember between your queries is how many random rows you have requested so far. Then just fetch as many rows as needed starting from where you stopped in the previous query.


Add a column RandomNumber binary(8) to the table. You can choose a different size. 8 bytes should be enough.

Populate it with random numbers. Once.

UPDATE tablename
SET RandomNumber = CRYPT_GEN_RANDOM(8)

Create an index on RandomNumber column. Unique index. If it turns out that there are repeated random numbers (which is unlikely for 20,000 rows and random numbers 8 bytes long), then re-generate random numbers (run the UPDATE statement once again) until all of them are unique.

Request first 10 random rows:

SELECT TOP(10) *
FROM tablename
ORDER BY RandomNumber

As you process/use these 10 random rows remember the last used random number. The best way to do it depends on how you process these 10 random rows.

DECLARE @VarLastRandomNumber binary(8);
SET @VarLastRandomNumber = ... 
-- the random number from the last row returned by the previous query

Request next 10 random rows:

SELECT TOP(10) *
FROM tablename
WHERE RandomNumber > @VarLastRandomNumber
ORDER BY RandomNumber

Process them and remember the last used random number.

Repeat. As a bonus you can request different number of random rows on each iteration (it doesn't have to be 10 each time).

Upvotes: 0

Jiri Tousek
Jiri Tousek

Reputation: 12440

Since required outcome of your second query depends on the (random) outcome of the first query, the querying cannot be stateless. You'll need to store the state (info about the previous query/queries) somewhere, somehow.

The simplest solution would probably be storing the already-retrieved rows or their IDs in a temporary table and then querying ... where id not in (select id from temp_table) in the second query.

Upvotes: 2

Related Questions