Reputation: 524
Hi brilliant thinkers,
I want to create a CASE condition to give me a "yes" for active_users that is if there exists within 60 days, a more recent uuid_ts for the same anonymous_id.
SELECT t1.anonymous_id user_id,
t1.uuid_ts activity_date,
t2.uuid_ts signup_date,
-- Activity Lifetime: difference of number of days signed up to last activity
DATE_DIFF(CAST(t2.uuid_ts AS DATE), CAST(t1.uuid_ts AS DATE), DAY) AS activity_lifetime,
-- New Users: If month of activity is same as sign_up month
(CASE WHEN DATE_DIFF(CAST(t1.uuid_ts AS DATE), CAST(t2.uuid_ts AS DATE), MONTH)=0 THEN TRUE ELSE FALSE END) AS new_user,
-- Active Users: If month of activity is greater than sign_up month AND activity is found
(CASE WHEN DATE_DIFF(CAST(t1.uuid_ts AS DATE), CAST(t2.uuid_ts AS DATE), MONTH)>0
-- ** ____ NEED HELP HERE ____ **
AND anonymous_id NOT IN (SELECT anonymous_id FROM datascience.last_user_activity)
AND DATE_ADD(activity_date, INTERVAL 60 DAY) > (S)
FROM datascience.last_user_activity AS t1
INNER JOIN datascience.full_signup_completed AS t2
ON t2.anonymous_id = t1.anonymous_id
WHERE DATE(t1.uuid_ts) IS NOT NULL AND DATE(t2.uuid_ts) IS NOT NULL
ORDER BY activity_lifetime DESC
SAMPLE DATA:
anon_id|signup_date|activity_date|
__________________________________
123 |01-01-2019 |02-01-2019 |
123 |01-01-2019 |02-02-2019 |
123 |01-01-2019 |02-03-2019 |
123 |01-01-2019 |02-04-2019 |
WANTED:
anon_id|signup_date|activity_date| active
__________________________________
123 |01-01-2019 |02-01-2019 | yes
123 |01-01-2019 |02-02-2019 | yes
123 |01-01-2019 |02-03-2019 | no
123 |01-01-2019 |02-04-2019 | no
if a future date exists in the same row, within the range of 60 days, then the field active shows "yes", else a "no".
Upvotes: 0
Views: 103
Reputation: 1576
Still not 100% sure this is what you are looking for, but I hope it helps:
WITHIN 60 days:
(The output would be "yes, yes, yes, no" since 02-04-2019 > 02-03-2019 and within 60 days)
WITH
sample_data AS (
SELECT
'123' AS anon_id, DATE('2019-01-01') AS signup_date,
DATE('2019-01-02') AS activity_date
UNION ALL
SELECT
'123' AS anon_id,
DATE('2019-01-01') AS signup_date,
DATE('2019-02-02') AS activity_date
UNION ALL
SELECT
'123' AS anon_id,
DATE('2019-01-01') AS signup_date,
DATE('2019-03-02') AS activity_date
UNION ALL
SELECT
'123' AS anon_id,
DATE('2019-01-01') AS signup_date,
DATE('2019-04-02') AS activity_date)
SELECT
anon_id,
signup_date,
activity_date,
(CASE
WHEN EXISTS( SELECT 'found' FROM sample_data t2 WHERE t2.anon_id = t1.anon_id AND t2.activity_date > t1.activity_date AND t2.activity_date <= DATE_ADD(t1.activity_date, INTERVAL 60 DAY)) THEN 'yes'
ELSE
'no'
END
) AS active
FROM
sample_data t1
ORDER BY 1,2,3
60 DAYS or BEYOND:
(The output would be "yes, no, no, no" since February has 28 days and March 31, so between 02-02-2019 and 02-04-2019 there are 59 days)
WITH
sample_data AS (
SELECT
'123' AS anon_id,
DATE('2019-01-01') AS signup_date,
DATE('2019-01-02') AS activity_date
UNION ALL
SELECT
'123' AS anon_id,
DATE('2019-01-01') AS signup_date,
DATE('2019-02-02') AS activity_date
UNION ALL
SELECT
'123' AS anon_id,
DATE('2019-01-01') AS signup_date,
DATE('2019-03-02') AS activity_date
UNION ALL
SELECT
'123' AS anon_id,
DATE('2019-01-01') AS signup_date,
DATE('2019-04-02') AS activity_date)
SELECT
anon_id,
signup_date,
activity_date,
(CASE
WHEN EXISTS( SELECT 'found' FROM sample_data t2 WHERE t2.anon_id = t1.anon_id AND t2.activity_date >= DATE_ADD(t1.activity_date, INTERVAL 60 DAY)) THEN 'yes'
ELSE
'no'
END
) AS active
FROM
sample_data t1
ORDER BY 1,2,3
Upvotes: 2
Reputation: 3616
Your question/logic/dates are a bit unclear, but I think the following query should point you in the right direction.
with joined as (
-- Join your tables and handle casting here (only have to do it once)
select
anonymous_id,
date(full_signup_completed.uuid_ts) as signup_date,
extract(month from full_signup_completed.uuid_ts) as signup_month,
date(last_user_activity.uuid_ts) as activity_date,
extract(month from last_user_activity.uuid_ts) as activity_month
from datascience.full_signup_completed
left join datascience.last_user_activity using(anonymous_id)
where full_signup_completed.uuid_ts is not null and last_user_activity.uuid_ts is not null
),
activity60 as (
-- for each activity date, is there a future activity date within 60 days?
select j1.anonymous_id,j1.activity_date, true as has_activity_within_60_days
from joined j1
cross join joined j2
where j1.anonymous_id = j2.anonymous_id and date_diff(j2.activity_date, j1.activity_date, day) <= 60
group by 1,2
),
final as (
-- Get all of your logic
select
joined.*,
date_diff(activity_date,signup_date, day) as activity_lifetime,
signup_month = activity_month as new_user, -- Evaluates to T/F
(activity_month > signup_month) and has_activity_within_60_days as your_custom_field -- Evaluates to aT/F
from joined
inner join activity60 using(anonymous_id,activity_date)
)
select * from final
order by activity_lifetime desc
In your example, are your dates in DD-MM-YYYY
format? If not, i'm not sure how the 60 day constraint makes sense.
Upvotes: 1