David Hedlund
David Hedlund

Reputation: 129802

Select rows randomly distributed around a given mean

I have a table that has a value field. The records have values somewhat evenly distributed between 0 and 100.

I want to query this table for n records, given a target mean, x, so that I'll receive a weighted random result set where avg(value) will be approximately x.

I could easily do something like

SELECT TOP n * FROM table ORDER BY abs(x - value)

... but that would give me the same result every time I run the query.

What I want to do is to add weighting of some sort, so that any record may be selected, but with diminishing probability as the distance from x increases, so that I'll end up with something like a normal distribution around my given mean.

I would appreciate any suggestions as to how I can achieve this.

Upvotes: 2

Views: 256

Answers (2)

Conrad Frix
Conrad Frix

Reputation: 52675

why not use the RAND() function?

SELECT TOP n * FROM table ORDER BY abs(x - value) + RAND()

EDIT

Using Rand won't work because calls to RAND in a select have a tendency to produce the same number for most of the rows. Heximal was right to use NewID but it needs to be used directly in the order by

SELECT Top N value  
FROM  table  
ORDER BY
    abs(X - value) + (cast(cast(Newid()  as varbinary) as integer))/10000000000

The large divisor 10000000000 is used to keep the avg(value) closer to X while keeping the AVG(x-value) low.

With that all said maybe asking the question (without the SQL bits) on https://stats.stackexchange.com/ will get you better results.

Upvotes: 2

heximal
heximal

Reputation: 10517

try

SELECT TOP n * FROM table ORDER BY abs(x - value),  newid()

or

select * from (
    SELECT TOP n * FROM table ORDER BY abs(x - value)
  ) a order by newid()

Upvotes: 0

Related Questions