Shahin
Shahin

Reputation: 1316

Subset a table such that the samples are normally distributed around a value in a column

I have the following table called original_table:

name      height      age
personA   180         21
personB   190         37
personC   168         27
personD   182         56
...

My goal is to create two random samples of size 100 from the original_table such that the mean of age is normally distributed around 25, and height average is close to 175. Basically, a person with age 25 and height 175 has the highest chance of being in the tables, but not guaranteed.

Upvotes: 0

Views: 49

Answers (1)

nayi224
nayi224

Reputation: 565

The trick is to find the normal distribution function from other languages, and then convert it to mysql syntax. For example, convert from the following java code to mysql

    //java.util.Random
    synchronized public double nextGaussian() {
        // See Knuth, ACP, Section 3.4.1 Algorithm C.
        if (haveNextNextGaussian) {
            haveNextNextGaussian = false;
            return nextNextGaussian;
        } else {
            double v1, v2, s;
            do {
                v1 = 2 * nextDouble() - 1; // between -1 and 1
                v2 = 2 * nextDouble() - 1; // between -1 and 1
                s = v1 * v1 + v2 * v2;
            } while (s >= 1 || s == 0);
            double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
            nextNextGaussian = v2 * multiplier;
            haveNextNextGaussian = true;
            return v1 * multiplier;
        }
    }
with recursive t0(v1, v2, s, num1, num2, rn) as (
select @v1:= rand() * 2 - 1,
       @v2:= rand() * 2 - 1,
       @s:= @v1*@v1 + @v2*@v2,
       sqrt(-2 * log(@s) / (@s)) * @v1,
       sqrt(-2 * log(@s) / (@s)) * @v2,
       @rn:=case when @s < 1 and @s > 0 then 1 else 0 end
 union all
select @v1:= rand() * 2 - 1,
       @v2:= rand() * 2 - 1,
       @s:= @v1*@v1 + @v2*@v2,
       sqrt(-2 * log(@s) / (@s)) * @v1,
       sqrt(-2 * log(@s) / (@s)) * @v2,
       @rn:=case when @s < 1 and @s > 0 then 1 else 0 end + @rn
  from t0
 where @rn < 100
)
select 175 + t1.num1 * 10,
       175 + t1.num2 * 10,
       25 + t1.num1 * 8,
       25 + t1.num2 * 8
  from t0 t1
 where t1.num1 is not null
 order by t1.num1
;


Upvotes: 1

Related Questions