Ahmad M.
Ahmad M.

Reputation: 524

Comparing with other rows in table

Hi brilliant thinkers,

I want to create a CASE condition to give me a "yes" for active_users that is if there exists within 60 days, a more recent uuid_ts for the same anonymous_id.

SELECT  t1.anonymous_id user_id,
        t1.uuid_ts activity_date, 
        t2.uuid_ts signup_date,

        -- Activity Lifetime: difference of number of days signed up to last activity
        DATE_DIFF(CAST(t2.uuid_ts AS DATE), CAST(t1.uuid_ts AS DATE), DAY) AS activity_lifetime,

        -- New Users: If month of activity is same as sign_up month
        (CASE WHEN DATE_DIFF(CAST(t1.uuid_ts AS DATE), CAST(t2.uuid_ts AS DATE), MONTH)=0 THEN TRUE ELSE FALSE END) AS new_user,

        -- Active Users: If month of activity is greater than sign_up month AND activity is found 
        (CASE WHEN DATE_DIFF(CAST(t1.uuid_ts AS DATE), CAST(t2.uuid_ts AS DATE), MONTH)>0 
        -- ** ____ NEED HELP HERE ____ **
         AND anonymous_id NOT IN (SELECT anonymous_id FROM datascience.last_user_activity)
         AND DATE_ADD(activity_date, INTERVAL 60 DAY) > (S)


FROM datascience.last_user_activity AS t1
INNER JOIN datascience.full_signup_completed AS t2
ON t2.anonymous_id = t1.anonymous_id
WHERE DATE(t1.uuid_ts) IS NOT NULL AND DATE(t2.uuid_ts) IS NOT NULL
ORDER BY activity_lifetime DESC

SAMPLE DATA:

anon_id|signup_date|activity_date|
__________________________________
123    |01-01-2019 |02-01-2019   | 
123    |01-01-2019 |02-02-2019   | 
123    |01-01-2019 |02-03-2019   | 
123    |01-01-2019 |02-04-2019   |

WANTED:

anon_id|signup_date|activity_date| active
__________________________________
123    |01-01-2019 |02-01-2019   | yes
123    |01-01-2019 |02-02-2019   | yes
123    |01-01-2019 |02-03-2019   | no
123    |01-01-2019 |02-04-2019   | no

if a future date exists in the same row, within the range of 60 days, then the field active shows "yes", else a "no".

Upvotes: 0

Views: 103

Answers (2)

ch_mike
ch_mike

Reputation: 1576

Still not 100% sure this is what you are looking for, but I hope it helps:

WITHIN 60 days:

(The output would be "yes, yes, yes, no" since 02-04-2019 > 02-03-2019 and within 60 days)

WITH
  sample_data AS (
  SELECT
    '123' AS anon_id,    DATE('2019-01-01') AS signup_date,
    DATE('2019-01-02') AS activity_date
  UNION ALL
  SELECT
    '123' AS anon_id,
    DATE('2019-01-01') AS signup_date,
    DATE('2019-02-02') AS activity_date
  UNION ALL
  SELECT
    '123' AS anon_id,
    DATE('2019-01-01') AS signup_date,
    DATE('2019-03-02') AS activity_date
  UNION ALL
  SELECT
    '123' AS anon_id,
    DATE('2019-01-01') AS signup_date,
    DATE('2019-04-02') AS activity_date)
SELECT
  anon_id,
  signup_date,
  activity_date,
  (CASE
      WHEN EXISTS( SELECT 'found' FROM sample_data t2 WHERE t2.anon_id = t1.anon_id AND t2.activity_date > t1.activity_date AND t2.activity_date <= DATE_ADD(t1.activity_date, INTERVAL 60 DAY)) THEN 'yes'
    ELSE
    'no'
  END
    ) AS active
FROM
  sample_data t1
ORDER BY 1,2,3

60 DAYS or BEYOND:

(The output would be "yes, no, no, no" since February has 28 days and March 31, so between 02-02-2019 and 02-04-2019 there are 59 days)

WITH
  sample_data AS (
  SELECT
    '123' AS anon_id,
    DATE('2019-01-01') AS signup_date,
    DATE('2019-01-02') AS activity_date
  UNION ALL
  SELECT
    '123' AS anon_id,
    DATE('2019-01-01') AS signup_date,
    DATE('2019-02-02') AS activity_date
  UNION ALL
  SELECT
    '123' AS anon_id,
    DATE('2019-01-01') AS signup_date,
    DATE('2019-03-02') AS activity_date
  UNION ALL
  SELECT
    '123' AS anon_id,
    DATE('2019-01-01') AS signup_date,
    DATE('2019-04-02') AS activity_date)
SELECT
  anon_id,
  signup_date,
  activity_date,
  (CASE
      WHEN EXISTS( SELECT 'found' FROM sample_data t2 WHERE t2.anon_id = t1.anon_id AND t2.activity_date >= DATE_ADD(t1.activity_date, INTERVAL 60 DAY)) THEN 'yes'
    ELSE
    'no'
  END
    ) AS active
FROM
  sample_data t1
ORDER BY 1,2,3

Upvotes: 2

rtenha
rtenha

Reputation: 3616

Your question/logic/dates are a bit unclear, but I think the following query should point you in the right direction.

with joined as (
  -- Join your tables and handle casting here (only have to do it once)
  select
    anonymous_id, 
    date(full_signup_completed.uuid_ts) as signup_date,
    extract(month from full_signup_completed.uuid_ts) as signup_month,
    date(last_user_activity.uuid_ts) as activity_date,
    extract(month from last_user_activity.uuid_ts) as activity_month
  from datascience.full_signup_completed
  left join datascience.last_user_activity using(anonymous_id)
  where full_signup_completed.uuid_ts is not null and last_user_activity.uuid_ts is not null
),
activity60 as (
  -- for each activity date, is there a future activity date within 60 days?
  select j1.anonymous_id,j1.activity_date, true as has_activity_within_60_days
  from joined j1
  cross join joined j2
  where j1.anonymous_id = j2.anonymous_id and date_diff(j2.activity_date, j1.activity_date, day) <= 60
  group by 1,2
),
final as (
  -- Get all of your logic
  select
    joined.*,
    date_diff(activity_date,signup_date, day) as activity_lifetime,
    signup_month = activity_month as new_user, -- Evaluates to T/F
    (activity_month > signup_month) and has_activity_within_60_days as your_custom_field -- Evaluates to aT/F
  from joined
  inner join activity60 using(anonymous_id,activity_date)
)
select * from final
order by activity_lifetime desc

In your example, are your dates in DD-MM-YYYY format? If not, i'm not sure how the 60 day constraint makes sense.

Upvotes: 1

Related Questions