Glaucon
Glaucon

Reputation: 71

Generate normally distributed series using BIgQuery

Is there a way to generate normally distributed series in BQ? ideally specifying the mean and sd of the distribution. I found a way using Marsaglia polar method , but it is not ideal for I do not want polar coordinates of the distribution but to generate an array that follows the parameters specified for it to be normally distributed. Thank you in advance.

Upvotes: 2

Views: 4395

Answers (3)

VictorZurkowski
VictorZurkowski

Reputation: 202

Earlier answers give the probability distribution function of a normal rv. Here I modify previous answers to give a random number generated with the desired distribution, in BQ standard SQL, using the 'polar coordinates' method. The question asks not to use polar coordinates, which is an odd request, since polar coordinates are not use in the generation of the normally distributed random number.

CREATE TEMPORARY FUNCTION rnorm ( mu FLOAT64, sigma FLOAT64 ) AS 
(
    (select mu + sigma*(sqrt( 2*abs(
                                    log( RAND())
                                    )
                            )
                       )*cos( 2*ACOS(-1)*RAND())
    )
)
;

select 
num ,
rnorm(-1, 5.3) as RAND_NORM
FROM UNNEST(GENERATE_ARRAY(1, 17) ) AS num

Upvotes: 3

Ilja
Ilja

Reputation: 1053

The easiest way to do it in BQ is by creating a custom function:

CREATE OR REPLACE FUNCTION 
`your_project.functions.normal_distribution_pdf`
(x ANY TYPE, mu ANY TYPE, sigma ANY TYPE) AS (
(
SELECT 
safe_divide(1,sigma * power(2 * ACOS(-1),0.5)) * exp(-0.5 * power(safe_divide(x-mu,sigma),2))
)
);

Next you only need to apply the function:

with inputs as (
SELECT 1 as x, 0 as mu, 1 as sigma
union all 
SELECT 1.5 as x, 1 as mu, 2 as sigma
 union all 
SELECT 2 as x , 2 as mu, 3 as sigma     

)


SELECT x,
       `your_project.functions.normal_distribution_pdf`(x, mu, sigma) as normal_pdf
from
inputs

Upvotes: 0

Iñigo
Iñigo

Reputation: 2670

This query gives you the euclidean coordinates of the normal distribution centred in 0. You can adjust both the mean (mean variable) or the sd (variance variable) and the x-axis values (GENERATE_ARRAY(beginning,end,step)) :

CREATE TEMPORARY FUNCTION normal(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
  var mean=0;
  var variance=1;
  var x0=1/(Math.sqrt(2*Math.PI*variance));
  var x1=-Math.pow(x-mean,2)/(2*Math.pow(variance,2));
  return x0*Math.pow(Math.E,x1);
""";
WITH numbers AS
  (SELECT x FROM UNNEST(GENERATE_ARRAY(-10, 10,0.5)) AS x)
SELECT x, normal(x) as normal
FROM numbers;

For doing that, I used "User Defined Funtions" [1]. They are used when you want to have another SQL expression or when you want to use Java Script (as I did).

NOTE: I used the probability density function of the normal distribution, if you want to use another you'd need to change variables x0,x1 and the return (I wrote them separately so it's clearer).

Upvotes: 1

Related Questions