Matt Burland
Matt Burland

Reputation: 45155

Select random rows with seeding

Using SQL Server, I have a table with around 5.5 million rows and I want to randomly select a set of maybe 120 rows that meet some criteria.

That's some what related to Select n random rows from SQL Server table and https://msdn.microsoft.com/en-us/library/cc441928.aspx, but my problem is that I want to be able to seed this so I can randomly pick the same 120 rows consistently and then get a different, random set of rows if I use a different seed.

I could do something like this in my application:

var rand = new Random(seed);
var allExamples = db.myTable.Where(/*some condition*/).ToList();
var subSet = db.myTable.Select(x => new { x, r = rand.NextDouble())
    .OrderBy(x => x.r)
    .Take(120)
    .Select(x => x.x).ToList();

Which works, but, as you might guess, with 5.5 million rows is glacially slow. So I'm really looking for a way to make this work on the SQL server side so I don't have to retrieve and process all the rows.

Upvotes: 7

Views: 220

Answers (2)

Bobson
Bobson

Reputation: 13706

Assign a unique guid to each row as an indexed column (but not a clustered one). Choose three alpha-numeric characters randomly, and then select all rows where the guid starts with those characters.

36*36*36=46,656
5,500,000/46,656 ~= 117.88

Since newid() doesn't follow a pattern, that provides your random grouping, evenly distributed, and if you use the same three characters you'll always get the same data back, which covers the seeding.

If that isn't performant enough, create another column to specifically index the first three characters.

(Sorry for the brevity - on my phone)

Upvotes: 0

Matthew Whited
Matthew Whited

Reputation: 22443

If you want something that looks random then mix your [PrimaryKey] with some other data...

SELECT *
FROM [your table]
ORDER BY
    CHECKSUM([primarykey]) ^ CHECKSUM('your seed') 

... this will still be a table scan but it should have better performance then pulling the entire set of data do your client just to throw away everything except 120 rows.

Upvotes: 2

Related Questions