mateoc15
mateoc15

Reputation: 654

Aggregate with lead/lag

I have a table called activities which has a memberId and a timestamp. I would like to find out in a given month how many members performed an activity (ie - had a record in the activities table) who did not have an activity in the previous 12 months. I believe lead/lag would be helpful here, but I'm having trouble wrapping my brain around it.

(I've tagged both Apache Hadoop and MS SQL Server here because I could potentially do this in either, and I think I could translate a solution for one to the other pretty easily).

Any help appreciated!

Thank you!

Upvotes: 0

Views: 2874

Answers (2)

Wolfgang Kais
Wolfgang Kais

Reputation: 4100

When using the LAG function, we need to create a single record per member and month first, use the LAG function to get the preious activity month and finally use a where clause to only get what we want:

DECLARE 
  @year int = 2018, 
  @month int = 7;

WITH
  monthwise (MemberID, FirstOfMonth) AS (
    SELECT DISTINCT  MemberID, DATEADD(month, DATEDIFF(month, 0, ActivityDate), 0)
    FROM Activities
  ),
  prevActivity (MemberID, FirstOfMonth, prevFirstOfMonth) AS (
    SELECT MemberID, FirstOfMonth
    , LAG(FirstOfMonth) OVER (PARTITION BY MemberID ORDER BY FirstOfMonth)
    FROM monthwise
  )
SELECT MemberID
FROM prevActivity
WHERE MONTH(FirstOfMonth) = @month
  AND YEAR(FirstOfMonth) = @year
  AND (prevFirstOfMonth IS NULL OR DATEDIFF(month, prevFirstOfMonth, FirstOfMonth) > 12)

You can also do this without the LAG function: Use two queries, one for the members that have an activity in this month, one for the members that have an activity in the previous twelve months. Then use inner and left joins to find the members with activities in this month without activities in the previous months:

DECLARE 
  @year int = 2018, 
  @month int = 7;

WITH
  this (MemberID) AS (
    SELECT DISTINCT MemberID
    FROM Activities 
    WHERE YEAR(ActivityDate) = @year
      AND MONTH(ActivityDate) = @month
  ),
  prev (MemberID) AS (
    SELECT DISTINCT MemberID
    FROM Activities
    WHERE ActivityDate < DATEADD(month, @month-1 +12*(@year-1900), 0)
      AND ActivityDate >= DATEADD(month, @month-1 +12*(@year-1901), 0)
  )
SELECT m.MemberID
FROM Members m
  INNER JOIN this ON m.MemberID = this.MemberID
  LEFT JOIN prev ON m.MemberID = prev.MemberID
WHERE prev.MemberID IS NULL

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271151

You can do this using lag():

select year(ts), month(ts),
       (count(distinct memberid) - 
        count(distinct case when prev_ts > dateadd(year, -1, ts) then memberid)
       ) as 
from (select memberid, 
             lag(ts) over (partition by memberid order by ts) as prev_ts
      from activities a
     ) a
group by year(ts), month(ts);

Upvotes: -1

Related Questions