Reputation: 21

postgres - Can a window function be used to solve this query?

I have data in the following schema:

| user_id | date   | score  |
| ------- | ------ | ------ |
| 1       | 201901 | 1      |
| 1       | 201902 | 2      |
| 1       | 201903 | 3      |
| 2       | 201901 | 1      |
| 2       | 201902 | -1     |
| 2       | 201903 | 2      |

I need to come up with the following result:

| user_id | one_score  | two_score | three_score | max_score | min_score |
| ------- | ---------- | --------- | ----------- | --------- | --------- |
| 1       | 1          | 3         | 6           | 3         | 1         |
| 2       | 1          | 0         | 2           | 2         | -1        |

Note that one_score is the sum of the first result, two_score is the sum of the first two results, and three score is the sum of the first three results associated with a user_id.

The general layout of the query I have so far is:

SELECT
  MAX(score),
  MIN(score)
FROM scores
GROUP BY user_id

I'm not sure what the best approach is for calculating one_score, two_score and three_score. One possible approach is to write a custom aggregate function for each case, that takes in as input the entire column:

SELECT
  MAX(score),
  MIN(score),
  one_score(score),
  two_score(score),
  three_score(score)
FROM scores
GROUP BY user_id

I was wondering if there was a better approach than this involving window functions. It seems like what I should be changing in each column is the number of rows the sum function is being applied on, instead of writing individual functions for each case. How would I write a window function for the rolling sums one_score, two_score, three_score?

Note - this is a simplified case modeled from a "real-world" case with two differences:

Instead of a sum function, it will be a mathematical expression
Instead of ranges of 1, 2, 3, the ranges will be quite varied (last 10, last 30, last 50, etc).

Upvotes: 2

Answers (3)

Barbaros Özhan

Reputation: 65433

For most of the DBMS including Postgres, you can use sum(..) over ( partition by ... order by ... ), max(..) over ( partition by ... ) and min(..) over ( partition by ... ) window analytic function for your case. But this way, you get unpivoted results those should be pivoted. Then we need one more value for ordinality of the scores during pivoting. Therefore, a rank() or row_number() function would be needed within the subquery so that the yielded values to be used in the main query. As a result, consider :

select user_id,
       max(case when rnk = 1 then score end) as score_one,
       max(case when rnk = 2 then score end) as score_two,
       max(case when rnk = 3 then score end) as score_three,
       max(max_score) as max_score,
       min(min_score) as min_score
  from
  (
   select user_id,
          rank() over ( partition by user_id order by date ) as rnk,
          sum(score) over ( partition by user_id order by date ) as score,
          max(score) over ( partition by user_id ) as max_score,
          min(score) over ( partition by user_id ) as min_score
     from scores
   ) q
  group by user_id

Demo

Upvotes: 1

klin

Reputation: 121919

I like the OP's idea of the custom aggregate:

create or replace function limited_sum_state(int[], int, int)
returns int[] language plpgsql as $$
begin
    if $1[1] < $2 then
        $1[1] := $1[1] + 1;
        $1[2] := $1[2] + $3;
    end if;
    return $1;
end $$;

create or replace function limited_sum_final(int[])
returns int language sql as $$
    select $1[2]
$$;

create aggregate sum_of_first_elements(int, int) (
    sfunc = limited_sum_state,
    stype = int[],
    finalfunc = limited_sum_final,
    initcond = '{0, 0}');

Now we can write the query in an elegant way:

select
    user_id,
    sum_of_first_elements(1, score order by date) as one_score,
    sum_of_first_elements(2, score order by date) as two_score,
    sum_of_first_elements(3, score order by date) as three_score,
    max(score) as max_score,
    min(score) as min_score
from scores
group by user_id;

Db<>fiddle.

Upvotes: 1

sticky bit

Reputation: 37527

You can use the row_number() window function to number the rows per user and then use these number for a FILTER clause to sum().

SELECT x.user_id,
       sum(x.score) FILTER (WHERE x.rn <= 1) one_score,
       sum(x.score) FILTER (WHERE x.rn <= 2) two_score,
       sum(x.score) FILTER (WHERE x.rn <= 3) three_score,
       max(x.score) max_score,
       min(x.score) min_score
       FROM (SELECT s.user_id,
                    s.score,
                    row_number() OVER (PARTITION BY s.user_id
                                       ORDER BY s.date) rn
                    FROM scores s) X
       GROUP BY x.user_id;

db<>fiddle

Upvotes: 3

postgres - Can a window function be used to solve this query?

Answers (3)

Related Questions