Reputation: 2119
The query below groups the results from first
into 4 equally spaced date bins and aggregates an average for the_value
in each bin.
WITH first as(
SELECT
extract(EPOCH FROM foo.t_date) as the_date,
foo_val as the_value
FROM bar
INNER JOIN foo
ON
foo.user_id = bar.x_id
and
foo.user_name = 'xxxx'
)
SELECT bin, round(sum(bin_sum) OVER w /sum(bin_ct) OVER w, 2) AS running_avg
FROM (
SELECT width_bucket(first.the_date
, x.min_epoch, x.max_epoch, x.bins) AS bin
, sum(first.the_value) AS bin_sum
, count(*) AS bin_ct
FROM first
, (SELECT MIN(first.the_date) AS min_epoch
, MAX(first.the_date) AS max_epoch
, 4 AS bins
FROM first
) x
GROUP BY 1
) sub
WINDOW w AS (ORDER BY bin)
ORDER BY 1;
I would like to be able to only calculate the average for the lowest say 20 the_value
's in each bin. From other posts here on Stackoverflow I have seen that this is possible and that perhaps ORDER BY the_value
and rank()
is the best way to go about it. But my struggle is that I'm not sure where my current query should be modified to implement this.
Any insight would be appreciated.
Postgres version 9.3
Upvotes: 3
Views: 4520
Reputation: 656351
Use row_number()
on each bin.
First compute the row number rn
, then apply WHERE rn < 21
in the next step:
WITH first AS (
SELECT extract(EPOCH FROM foo.t_date) AS the_date
, foo_val AS the_value
FROM bar
JOIN foo ON foo.user_id = bar.x_id
AND foo.user_name = 'xxxx'
)
, x AS (
SELECT MIN(the_date) AS min_epoch
, MAX(the_date) AS max_epoch
FROM first
)
, y AS (
SELECT width_bucket(f.the_date, x.min_epoch, x.max_epoch, 4) AS bin, *
FROM first f, x
)
, z AS (
SELECT row_number() OVER (PARTITION BY bin ORDER BY the_value) AS rn, *
FROM y
)
SELECT bin, round(sum(bin_sum) OVER w / sum(bin_ct) OVER w, 2) AS running_avg
FROM (
SELECT bin
, sum(the_value) AS bin_sum
, count(*) AS bin_ct
FROM z
WHERE rn < 21 -- max 20 lowest values
GROUP BY 1
) sub
WINDOW w AS (ORDER BY bin)
ORDER BY 1;
CTEs y
and z
could be conflated. Similarly first
and x
could be conflated.
But it's clearer that way.
Untested, since we don't have test data.
Upvotes: 2