Ranyk
Ranyk

Reputation: 267

SQL: Selecting Distinct users and groupby based on condition

I have a DB table:

User_id | User Name | Join_date  | Activity_date |  
1           abc       01/01/2021    02-01-2021
2           jay       01/01/2021    03-01-2021
2           jay       01/01/2021    04-01-2021
1           abc       01/01/2021    09-01-2021
1           abc       01/01/2021    16-01-2021 
2           jay       01/01/2021    05-01-2021
3           xyz       03/03/2021    12-03-2021
3           xyz       03/03/2021    30-03-2021
2           jay       01/01/2021    26-01-2021

I want to bucketize the user based on their Activity_date, as in users who do the activity every next day into table1, users who do activity weekly(gap of 6-7 days) into table2, all the other users into table3.

The output something like this:

temporary_table1: (people who do an activity every next continues day)

User_id | User Name | Join_date  | Activity_date |  
2           jay       01/01/2021    03-01-2021
2           jay       01/01/2021    04-01-2021
2           jay       01/01/2021    05-01-2021

temporary_table_2: (people who do activist every 6-7 days from their first Activity_date)

User_id | User Name | Join_date  | Activity_date |  
1           abc       01/01/2021    02-01-2021
1           abc       01/01/2021    09-01-2021
1           abc       01/01/2021    15-01-2021 

temporary_table_3: (Adding user_id =2 here because from the last activity done by this user to the recent activity done there is more than seven days of gap)

User_id | User Name | Join_date  | Activity_date |  
3           xyz       03/03/2021    12-03-2021
3           xyz       03/03/2021    30-03-2021
2           jay       01/01/2021    26-01-2021

How to achieve this in SQL (Redshift)?

Upvotes: 0

Views: 59

Answers (1)

ABofill
ABofill

Reputation: 26

You should be able to do something like this and then you can make your grouping on the date_diff:

SELECT
user_id,
user_name,
join_date
activity_date,

#window function
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date ASC) as day_before,

#difference in days between activity_date and day_before
DATEDIFF(
    day, 
    LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date ASC),
    activity_date
    ) as date_diff
FROM your_dataset

Upvotes: 1

Related Questions