Reputation: 21
I have data in the following schema:
| user_id | date | score |
| ------- | ------ | ------ |
| 1 | 201901 | 1 |
| 1 | 201902 | 2 |
| 1 | 201903 | 3 |
| 2 | 201901 | 1 |
| 2 | 201902 | -1 |
| 2 | 201903 | 2 |
I need to come up with the following result:
| user_id | one_score | two_score | three_score | max_score | min_score |
| ------- | ---------- | --------- | ----------- | --------- | --------- |
| 1 | 1 | 3 | 6 | 3 | 1 |
| 2 | 1 | 0 | 2 | 2 | -1 |
Note that one_score is the sum of the first result, two_score is the sum of the first two results, and three score is the sum of the first three results associated with a user_id.
The general layout of the query I have so far is:
SELECT
MAX(score),
MIN(score)
FROM scores
GROUP BY user_id
I'm not sure what the best approach is for calculating one_score, two_score and three_score. One possible approach is to write a custom aggregate function for each case, that takes in as input the entire column:
SELECT
MAX(score),
MIN(score),
one_score(score),
two_score(score),
three_score(score)
FROM scores
GROUP BY user_id
I was wondering if there was a better approach than this involving window functions. It seems like what I should be changing in each column is the number of rows the sum function is being applied on, instead of writing individual functions for each case. How would I write a window function for the rolling sums one_score, two_score, three_score?
Note - this is a simplified case modeled from a "real-world" case with two differences:
Upvotes: 2
Views: 155
Reputation: 65228
For most of the DBMS including Postgres, you can use sum(..) over ( partition by ... order by ... )
, max(..) over ( partition by ... )
and min(..) over ( partition by ... )
window analytic function for your case. But this way, you get unpivoted results those should be pivoted. Then we need one more value for ordinality of the scores during pivoting. Therefore, a rank()
or row_number()
function would be needed within the subquery so that the yielded values to be used in the main query. As a result, consider :
select user_id,
max(case when rnk = 1 then score end) as score_one,
max(case when rnk = 2 then score end) as score_two,
max(case when rnk = 3 then score end) as score_three,
max(max_score) as max_score,
min(min_score) as min_score
from
(
select user_id,
rank() over ( partition by user_id order by date ) as rnk,
sum(score) over ( partition by user_id order by date ) as score,
max(score) over ( partition by user_id ) as max_score,
min(score) over ( partition by user_id ) as min_score
from scores
) q
group by user_id
Upvotes: 1
Reputation: 121604
I like the OP's idea of the custom aggregate:
create or replace function limited_sum_state(int[], int, int)
returns int[] language plpgsql as $$
begin
if $1[1] < $2 then
$1[1] := $1[1] + 1;
$1[2] := $1[2] + $3;
end if;
return $1;
end $$;
create or replace function limited_sum_final(int[])
returns int language sql as $$
select $1[2]
$$;
create aggregate sum_of_first_elements(int, int) (
sfunc = limited_sum_state,
stype = int[],
finalfunc = limited_sum_final,
initcond = '{0, 0}');
Now we can write the query in an elegant way:
select
user_id,
sum_of_first_elements(1, score order by date) as one_score,
sum_of_first_elements(2, score order by date) as two_score,
sum_of_first_elements(3, score order by date) as three_score,
max(score) as max_score,
min(score) as min_score
from scores
group by user_id;
Upvotes: 1
Reputation: 37472
You can use the row_number()
window function to number the rows per user and then use these number for a FILTER
clause to sum()
.
SELECT x.user_id,
sum(x.score) FILTER (WHERE x.rn <= 1) one_score,
sum(x.score) FILTER (WHERE x.rn <= 2) two_score,
sum(x.score) FILTER (WHERE x.rn <= 3) three_score,
max(x.score) max_score,
min(x.score) min_score
FROM (SELECT s.user_id,
s.score,
row_number() OVER (PARTITION BY s.user_id
ORDER BY s.date) rn
FROM scores s) X
GROUP BY x.user_id;
Upvotes: 3