Reputation: 377
I know you can get the average, total, min, and max over a subset of the data using a window function. But is it possible to get, say, the median, or the 25th percentile instead of the average with the window function?
Put another way, how do I rewrite this to get the id and the 25th or 50th percentile sales numbers within each district rather than the average?
SELECT id, avg(sales)
OVER (PARTITION BY district) AS district_average
FROM t
Upvotes: 8
Views: 14729
Reputation: 10335
Another way to to this without joining as in Gordon's solution is by exploiting the array_agg
function which can be used as a window function:
create function pg_temp.percentile_cont_window
(c double precision[], p double precision)
returns double precision
language sql
as
$$
with t1 as (select unnest(c) as x)
select percentile_cont(p) WITHIN GROUP (ORDER BY x) from t1;
$$
;
-- -- -- -- -- -- -- -- --
-- Usage examples:
create temporary table t1 as (
select 1 as g, 1 as x
union select 1 as g, 2 as x
union select 2 as g, 3 as x
);
-- Built-in function raises an error if used without group:
-- Error: OVER is not supported for ordered-set aggregate percentile_cont
select *, percentile_cont(.1) within group (order by x) over() from t1;
-- Built-in function with grouping
select g, percentile_cont(.1) within group (order by x) from t1 group by g;
-- | g | percentile_cont |
-- |----:|------------------:|
-- | 1 | 1.1 |
-- | 2 | 3 |
-- Custom function basic usage (note that this is without grouping)
select t1.*, pg_temp.percentile_cont_window(array_agg(x) over(), .1) from t1;
-- | g | x | percentile_cont_window |
-- |----:|----:|-------------------------:|
-- | 1 | 2 | 1.2 |
-- | 1 | 1 | 1.2 |
-- | 2 | 3 | 1.2 |
-- Custom function usage with grouping is the same as using the built-in percentile_cont function
select t1.g, pg_temp.percentile_cont_window(array_agg(x), .1) from t1 group by g;
-- | g | percentile_cont_window |
-- |----:|-------------------------:|
-- | 2 | 3 |
-- | 1 | 1.1 |
Upvotes: 1
Reputation: 1269513
You can write this as an aggregation function using percentile_cont()
or percentile_disc()
:
select district, percentile_cont(0.25) within group (order by sales)
from t
group by district;
Unfortunately, Postgres doesn't currently support these as a window functions:
select id, percentile_cont(0.25) within group (order by sales) over (partition by district)
from t;
So, you can use a join
:
select t.*, p_25, p_75
from t join
(select district,
percentile_cont(0.25) within group (order by sales) as p_25,
percentile_cont(0.75) within group (order by sales) as p_75
from t
group by district
) td
on t.district = td.district
Upvotes: 16