Reputation: 11044
I want to query a rather large table (millions of rows), providing a seed value, in a manner that will guarantee a random order - but one that remains stable across multiple queries as long as the same seed is used.
The best I've come up with so far is
SELECT TOP n *
FROM tbl t
ORDER BY t.int_column % seed, t.int_column
Is this a usable approach, both from a performance standpoint and a somewhat-uniform distribution of result rows over different seeds?
Edit:
For context, the stable sort is needed because of multiple - possibly nested - WHERE NOT IN
queries that operate on the same data set; e.g.
SELECT *
FROM tbl t
WHERE t.some_criteria = 'some_value'
AND t.id NOT IN
(
SELECT TOP n t.id
FROM tbl t
WHERE t.some_other_criteria = 'some_other_value'
ORDER BY t.int_column % seed, t.int_column
)
AND t.id NOT IN
(
# etc.
)
When the order of the subselects is random, but not stable (i.e. NEWID()
, TABLESAMPLE()
), the result rows fluctuate wildly between executions.
Upvotes: 0
Views: 233
Reputation: 4442
Just a thought... You could add a "RamdomSort" column to you table. That way the the sort order will be truly random but will remain repeatable repeatable until you update the table with new values. Something along these lines...
ALTER TABLE dbo.MyTable ADD RandomSort INT NOT NULL
CONSTRAINT df_MyTable_RandomSort DEFAULT(0);
UPDATE mt SET
mt.RandomSort = ABS(CHECKSUM(NEWID())) % 100000 + 1
FROM
dbo.MyTable mt;
SELECT
*
FROM
dbo.MyTable mt
ORDER BY
mt.SomeValue;
If the situation warrants it, you could even add a covering, nonclustered index to eliminate the sort operation.
Upvotes: 0
Reputation: 27673
If you want random-looking ordering, you can do this with HASHBYTES
and some piece of data from the row you're selecting.
SELECT TOP 100 *
FROM tbl t
ORDER BY HASHBYTES('SHA1', CONCAT(STR(t.int_column), 'seed string'))
Now, the performance of this is a big question. Modern CPUs do SHA1 very quickly, so this might be good enough for your needs.
If you can more about performance and less about "good randomness," you could drop in a simple linear congruential generator as the transformation function:
SET ARITHABORT OFF;
SET ARITHIGNORE ON;
SET ANSI_WARNINGS OFF;
SELECT TOP 100 *
FROM tbl t
ORDER BY ((t.int_column + seed_number) * 1103515245 + 12345)
This will be faster, but less random.
Upvotes: 1