Reputation: 131
I wanna do some outlier detection with BigQuery and Datalab. MIN and MAX give me the same values but Quantiles are different. BigQuery gave me
Row f0_ f1_ approx_quantiles
1 0.01 820.55 0.01
190.04
820.55
which seems wrong they are just the min & max. Datalab give me other Quantiles.
I did:
SELECT MIN(Menge),MAX(Menge),APPROX_QUANTILES(Menge,2) AS approx_quantiles FROM `nifty-stage-155512.de_veolia_vus_dev_views.tank_data_4`
Do someone else experienced the same or does someone having a solutions???
Edit
From datalab I get
0.25 61.645
0.50 190.000
0.75 253.000
Name: MENGE, dtype: float64
Upvotes: 0
Views: 3844
Reputation: 131
Could be also the solution:
SELECT percentiles[offset(25)], percentiles[offset(50)], percentiles[offset(75)]
FROM (SELECT APPROX_QUANTILES(Menge, 100) percentiles FROM
Upvotes: 2
Reputation: 1946
I'm a little confused. Are you wanting the 25th and 75th quantiles? From the documentation.
APPROX_QUANTILES
APPROX_QUANTILES([DISTINCT] expression, number [{IGNORE|RESPECT} NULLS])
Description
Returns the approximate boundaries for a group of expression values, where number represents the number of quantiles to create. This function returns an array of number + 1 elements, where the first element is the approximate minimum and the last element is the approximate maximum.
If so you need want more quantiles, you need to increase the value for "number". So 4 would give you [min,25th,50th,75th,max] for example
Upvotes: 2