Reputation:
I have the following structure:
table user
user_id | month_year | fruits
------------------------------
1 | 2021-01 | apple
1 | 2021-01 | melon
1 | 2021-01 | orange
1 | 2021-02 | grape
1 | 2021-02 | orange
1 | 2021-02 | kiwi
1 | 2021-03 | grape
1 | 2021-03 | pear
1 | 2021-03 | banana
1 | 2021-04 | orange
1 | 2021-04 | kiwi
1 | 2021-04 | banana
1 | 2021-05 | grape
1 | 2021-05 | pear
1 | 2021-05 | kiwi
And I want the following result:
user | month_year | fruits | two_months_most_freq
-------------------------------------------------------------------------
1 | 2021-01 | apple, melon, orange | orange
1 | 2021-02 | grape, orange, kiwi | orange
1 | 2021-03 | grape, pear, banana | grape
1 | 2021-04 | orange, kiwi, banana | banana
1 | 2021-05 | grape, pear, kiwi | kiwi
Clearing: In the last column I want the most recurrent fruit in the last 2 months period, in other words, the most repeated in the actual and previous line. Notice that in the first line should return orange
because the window frame ahead should be used when the window frame behind is not available.
In the code below, I achieved the most recurrent fruit in the whole dataset.
select * from (
select user_id, year_month,
string_agg(distinct fruit) as fruits
from user
group by user_id, year_month
) join (
select user_id, fruit
from user
group by user_id, fruit
qualify 1 = row_number() over(partition by user_id order by count(*) desc)
)
using (user_id)
How can I apply this logic for specific time windows?
Upvotes: 2
Views: 65
Reputation: 173028
Consider below
select user_id, month_year, fruits,
if(prev_month_exists, two_months_most_freq, first_value(two_months_most_freq) over next_month) as two_months_most_freq
from (
select user_id, month_year, fruits,
( select fruit from unnest(split(two_month_fruits)) fruit
group by fruit order by count(*) desc limit 1
) as two_months_most_freq,
month, prev_month_exists
from (
select distinct user_id, month_year, month,
string_agg(fruit) over(partition by user_id, month_year) fruits,
string_agg(fruit) over last_two_months as two_month_fruits,
0 < count(*) over prev_month as prev_month_exists
from users, unnest([struct(
12 * extract(year from date(month_year || '-01')) + extract(month from date(month_year || '-01')) as month
)])
window
last_two_months as (partition by user_id order by month range between 1 preceding and current row),
prev_month as (partition by user_id order by month range between 1 preceding and 1 preceding)
)
)
window next_month as (partition by user_id order by month range between 1 following and 1 following)
if applied to sample data in your question - output is
Upvotes: 2