Reputation: 259
I am using BigQuery, and I need to compute the 25th, 50th, and 75th percentile of a column of a dataset.
For example, how can I get the aforementioned numbers using BigQuery and STANDARD SQL. I have looked at the PERCENT_RANK, RANK, and NTILE functions but I can't seem to crack it.
Here's some code that may guide me
Appreciate the help!
Upvotes: 25
Views: 71397
Reputation: 3087
You could get a fine-grain percentile profiling using the code below.
SELECT
offset + 1 AS percentile,
value AS percentile_value,
FROM UNNEST((
SELECT
APPROX_QUANTILES(value, 100) as percentile_value,
FROM data_set
)) AS percentile_value WITH OFFSET as offset
Then you could get any percentile as needed.
Upvotes: -1
Reputation: 335
In case approximate aggregation does not work for you, you might want to use the PERCENTILE_CONT function (though it will use much more memory so it might not work for huge data), e.g. the following example is from here
SELECT
PERCENTILE_CONT(x, 0) OVER() AS min,
PERCENTILE_CONT(x, 0.01) OVER() AS percentile1,
PERCENTILE_CONT(x, 0.5) OVER() AS median,
PERCENTILE_CONT(x, 0.9) OVER() AS percentile90,
PERCENTILE_CONT(x, 1) OVER() AS max
FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1;
+-----+-------------+--------+--------------+-----+
| min | percentile1 | median | percentile90 | max |
+-----+-------------+--------+--------------+-----+
| 0 | 0.03 | 1.5 | 2.7 | 3 |
+-----+-------------+--------+--------------+-----+
Upvotes: 10
Reputation: 14004
Check out APPROX_QUANTILES function in Standard SQL. If you ask for 100 quantiles - you get percentiles. So the query will look like following:
SELECT percentiles[offset(25)], percentiles[offset(50)], percentiles[offset(75)]
FROM (SELECT APPROX_QUANTILES(column, 100) percentiles FROM Table)
Upvotes: 27