Smasell
Smasell

Reputation: 1228

Add column with rolling calculation group by

I have a table like this:

current_date user_id mode_name mode_time
2021-10-01 1 game 10
2021-10-02 1 game 10
2021-10-02 1 tv 30
2021-10-09 1 music 10
2021-10-15 1 music 40
2021-10-01 2 music 10
2021-10-01 2 game 10
2021-10-04 2 game 10
2021-10-04 2 music 20
2021-10-05 2 tv 40
2021-10-11 2 tv 40
2021-10-12 2 game 20

And I want to add two columns:

  1. Column with favourite mode_name, according cumulate sum of mode_time column for every user_id
  2. Column with cumulate sum of mode_time column from favourite mode_name for every user_id

The desired table should look like this:

current_date user_id mode_name mode_time favourite_mode favourite_mode_time
2021-10-01 1 game 10 game 10
2021-10-02 1 game 10 tv 30
2021-10-02 1 tv 30 tv 30
2021-10-09 1 music 10 tv 30
2021-10-15 1 music 40 music 50
2021-10-01 2 music 10 game 10
2021-10-01 2 game 10 game 10
2021-10-04 2 game 10 music 30
2021-10-04 2 music 20 music 30
2021-10-05 2 tv 40 tv 40
2021-10-11 2 tv 40 tv 80
2021-10-12 2 game 20 tv 80

Table can be found here https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=e05302a2cfd81a2a55de811e294f513e

Upvotes: 1

Views: 43

Answers (1)

Guru Stron
Guru Stron

Reputation: 142048

You can use max with partition by user and mode to calculate rolling sum for mode and then use max and max_by to get corresponding values in outer select:

-- sample data
WITH dataset (date, user_id, mode_name, mode_time) AS (
    values ('2021-10-01', 1, 'game', 10),
        ('2021-10-02', 1, 'game', 10),
        ('2021-10-02', 1, 'tv', 30),
        ('2021-10-09', 1, 'music', 10),
        ('2021-10-15', 1, 'music', 40),
        ('2021-10-01', 2, 'game', 10),
        ('2021-10-01', 2, 'music', 10),
        ('2021-10-04', 2, 'game', 10),
        ('2021-10-04', 2, 'music', 20),
        ('2021-10-05', 2, 'tv', 40),
        ('2021-10-11', 2, 'tv', 40),
        ('2021-10-12', 2, 'game', 20)
) 

--query
SELECT date, user_id, mode_name, mode_time,
    max_by(mode_name, mode_time_rolling_time) OVER (
        PARTITION BY user_id
        ORDER BY date
    ) AS favourite_mode,
    max(mode_time_rolling_time) OVER (
        PARTITION BY user_id
        ORDER BY date
    ) AS favourite_mode_time
FROM(
        SELECT *,
            sum(mode_time) OVER (
                PARTITION BY user_id,
                mode_name
                ORDER BY date
            ) AS mode_time_rolling_time
        FROM dataset
    )
ORDER BY user_id, date

Output:

date user_id mode_name mode_time favourite_mode favourite_mode_time
2021-10-01 1 game 10 game 10
2021-10-02 1 game 10 tv 30
2021-10-02 1 tv 30 tv 30
2021-10-09 1 music 10 tv 30
2021-10-15 1 music 40 music 50
2021-10-01 2 game 10 game 10
2021-10-01 2 music 10 game 10
2021-10-04 2 music 20 music 30
2021-10-04 2 game 10 music 30
2021-10-05 2 tv 40 tv 40
2021-10-11 2 tv 40 tv 80
2021-10-12 2 game 20 tv 80

Upvotes: 1

Related Questions