Reputation: 267
I have a table in SQL
with 1000 users, with top 10 entries:
table1
:
+-------------+---------------+-------------------------+
| User_id | Mail_id | Reg_date(y-m-d) |
+-------------+---------------+-------------------------+
| user1 | email1 | 2019-11-09 12:23:53.253 |
| user1 | email1 | 2019-11-09 12:24:53.253 |
| user1 | email1 | 2019-11-09 13:20:53.253 |
| user1 | email1 | 2019-08-09 11:23:53.253 |
| user2 | email2 | 2019-09-08 10:29:53.253 |
| user3 | email3 | 2019-09-08 14:23:53.253 |
| user1 | email1 | 2019-12-09 13:20:53.253 |
| user1 | email1 | 2019-10-10 11:23:53.253 |
| user1 | email1 | 2019-10-13 10:29:53.253 |
| user2 | email5 | 2019-11-14 10:29:53.253 |
+-------------+---------------+-------------------------+
table2
:
+-------------+---------------+-------------------------+
| User_id | Session_id | Activity_date(y-m-d) |
+-------------+---------------+-------------------------+
| user1 | s1 | 2019-11-09 12:23:53.253 |
| user1 | s2 | 2019-12-09 12:24:53.253 |
| user1 | s3 | 2019-12-09 13:20:53.253 |
| user1 | s4 | 2020-01-09 11:23:53.253 |
| user2 | s5 | 2019-12-08 10:29:53.253 |
| user3 | s6 | 2020-02-08 14:23:53.253 |
| user1 | s7 | 2019-12-09 13:20:53.253 |
| user1 | s8 | 2020-03-10 11:23:53.253 |
| user1 | s9 | 2020-02-13 10:29:53.253 |
| user2 | s10 | 2020-03-14 10:29:53.253 |
+-------------+---------------+-------------------------+
I want to select the DISTINCT users only if whose Activity_date(y-m-d)
starts in date between 2019-11-01
to 2019-12-15
AND present in every month between the date 2019-11-01
to 2020-03-30
( to track the user activity 3 consecutive months).
The output
:
(Here User1
is the only person whose Activity_date(y-m-d)
falls between 2019-11-01
to 2019-12-15
AND also present in every month after that, between the date 2019-11-01
to 2020-03-30
.
User2 has the starting Activity_date(y-m-d)
falls between 2019-11-01
to 2019-12-15
But the Activity_date(y-m-d)
not present in every month (i.e, Month-Jan & Feb) so this user not been considered in output.
+-------------+---------------+-------------------------+
| User_id | Mail_id | Activity_date(y-m-d) |
+-------------+---------------+-------------------------+
| user1 | email1 | 2019-11-09 12:23:53.253 |
| user1 | email1 | 2019-12-09 12:24:53.253 |
| user1 | email1 | 2019-12-09 13:20:53.253 |
| user1 | email1 | 2020-01-09 11:23:53.253 |
| user1 | email1 | 2019-12-09 14:20:53.253 |
| user1 | email1 | 2020-02-13 10:29:53.253 |
| user1 | email1 | 2020-03-10 11:23:53.253 |
+-------------+---------------+-------------------------+
How to achieve this in SQL (Redshift)?
Upvotes: 0
Views: 78
Reputation: 10035
Here is a fiddle using your sample data to recreate your sample results. The db fiddle to test with is using postgres but this should also work on redshift. Let me know if this works.
The approach begins by first generating all consecutive months with a recursive cte month_periods
and months
before checking whether a user was active in each consecutive year-month generated in users_active_in_months
. The final projection selects the User_id
, Mail_id
and Activity_date
as shared in your target dataset where Activity_date
is from 2019-11-01
to 2019-12-15
and 2019-11-01
to 2020-03-30
or simply from 2019-11-01
to 2020-03-30
since this is fully inclusive.
CREATE TABLE table1 (
User_id VARCHAR(5),
Mail_id VARCHAR(6),
Reg_date TIMESTAMP
);
INSERT INTO table1
(User_id, Mail_id, Reg_date)
VALUES
('user1', 'email1', '2019-11-09 12:23:53.253'),
('user1', 'email1', '2019-11-09 12:24:53.253'),
('user1', 'email1', '2019-11-09 13:20:53.253'),
('user1', 'email1', '2019-08-09 11:23:53.253'),
('user2', 'email2', '2019-09-08 10:29:53.253'),
('user3', 'email3', '2019-09-08 14:23:53.253'),
('user1', 'email1', '2019-12-09 13:20:53.253'),
('user1', 'email1', '2019-10-10 11:23:53.253'),
('user1', 'email1', '2019-10-13 10:29:53.253'),
('user2', 'email5', '2019-11-14 10:29:53.253');
CREATE TABLE table2 (
User_id VARCHAR(5),
Session_id VARCHAR(3),
Activity_date TIMESTAMP
);
INSERT INTO table2
(User_id, Session_id, Activity_date)
VALUES
('user1', 's1', '2019-11-09 12:23:53.253'),
('user1', 's2', '2019-12-09 12:24:53.253'),
('user1', 's3', '2019-12-09 13:20:53.253'),
('user1', 's4', '2020-01-09 11:23:53.253'),
('user2', 's5', '2019-12-08 10:29:53.253'),
('user3', 's6', '2020-02-08 14:23:53.253'),
('user1', 's7', '2019-12-09 13:20:53.253'),
('user1', 's8', '2020-03-10 11:23:53.253'),
('user1', 's9', '2020-02-13 10:29:53.253'),
('user2', 's10', '2020-03-14 10:29:53.253');
Query #1
WITH recursive month_periods AS (
SELECT '2019-11-01'::timestamp as dt UNION ALL
SELECT (dt + interval '1 month')::timestamp as dt
FROM month_periods
WHERE dt <= '2020-03-30'
),
months AS (
SELECT EXTRACT(YEAR FROM dt)*100+EXTRACT(MONTH from dt) as ym from month_periods
),
users_active_in_months AS (
SELECT
User_id
FROM (
SELECT DISTINCT
m.ym,
t2.User_id
FROM
months m
LEFT JOIN
table2 t2 ON
( EXTRACT(YEAR FROM t2.Activity_date)*100+EXTRACT(MONTH FROM t2.Activity_date))=m.ym
WHERE t2.User_id is NOT NULL
) t
GROUP BY User_id
HAVING COUNT(User_id) = (SELECT COUNT(1) FROM months) - 1
)
SELECT DISTINCT
t2.User_Id,
t1.Mail_id,
t2.Activity_date
FROM
table2 t2
INNER JOIN
table1 t1 ON t2.User_id = t1.User_id
INNER JOIN
users_active_in_months um ON um.User_id = t1.User_id
WHERE
t2.Activity_date BETWEEN '2019-11-01' and '2020-03-30';
user_id | mail_id | activity_date |
---|---|---|
user1 | email1 | 2019-11-09T12:23:53.253Z |
user1 | email1 | 2019-12-09T12:24:53.253Z |
user1 | email1 | 2019-12-09T13:20:53.253Z |
user1 | email1 | 2020-01-09T11:23:53.253Z |
user1 | email1 | 2020-02-13T10:29:53.253Z |
user1 | email1 | 2020-03-10T11:23:53.253Z |
Upvotes: 1
Reputation: 1269445
You can use aggregation. From your description, table1
doesn't seem needed. You can get the user_id
:
select t2.user_id
from table2 t2
group by user_id
having min(activity_date) >= '2019-11-01' and
min(activity_date) <= '2019-12-15' and
count(distinct date_trunc('month', activity_date)) = 5;
You can then join in whatever other information you need.
Note: The above answers the specific question you asked. However, because you want activity in every month, this really requires that the first date be in November, rather than December. You can tweak the logic to handle this.
Upvotes: 1