ASP YOK
ASP YOK

Reputation: 131

BigQuery Quantiles

I wanna do some outlier detection with BigQuery and Datalab. MIN and MAX give me the same values but Quantiles are different. BigQuery gave me

Row f0_    f1_      approx_quantiles     
1   0.01   820.55   0.01     
                    190.04   
                    820.55

which seems wrong they are just the min & max. Datalab give me other Quantiles.

I did:

SELECT MIN(Menge),MAX(Menge),APPROX_QUANTILES(Menge,2)  AS approx_quantiles  FROM `nifty-stage-155512.de_veolia_vus_dev_views.tank_data_4`

Do someone else experienced the same or does someone having a solutions???

Edit

From datalab I get

0.25     61.645
0.50    190.000
0.75    253.000
Name: MENGE, dtype: float64

Upvotes: 0

Views: 3844

Answers (2)

ASP YOK
ASP YOK

Reputation: 131

Could be also the solution:

SELECT percentiles[offset(25)], percentiles[offset(50)], percentiles[offset(75)]
FROM (SELECT APPROX_QUANTILES(Menge, 100) percentiles FROM 

Upvotes: 2

Bobbylank
Bobbylank

Reputation: 1946

I'm a little confused. Are you wanting the 25th and 75th quantiles? From the documentation.

APPROX_QUANTILES

APPROX_QUANTILES([DISTINCT] expression, number [{IGNORE|RESPECT} NULLS])

Description

Returns the approximate boundaries for a group of expression values, where number represents the number of quantiles to create. This function returns an array of number + 1 elements, where the first element is the approximate minimum and the last element is the approximate maximum.

If so you need want more quantiles, you need to increase the value for "number". So 4 would give you [min,25th,50th,75th,max] for example

Upvotes: 2

Related Questions