tylercomp
tylercomp

Reputation: 901

How to determine if two records are 1 year apart (using a timestamp)

I need to analyze some weblogs and determine if a user has visited once, taken a year break, and visited again. I want to add a flag to every row (Y/N) with a VisitId that meets the above criteria.

How would I go about creating this sql?

Here are the fields I have, that I think need to be used (by analyzing the timestamp of the first page of each visit):

select VisitID, UserID, TimeStamp from page_view_t where pageNum = 1;

thanks - any help would be greatly appreciated.

Upvotes: 1

Views: 957

Answers (3)

Raghuram Duraisamy
Raghuram Duraisamy

Reputation: 136

Assuming page_view_t table stores UserID and TimeStamp details of each visit of the user, the following query will return users who have visited taking a break of at least an year (365 days) between two consecutive visits.

select t1.UserID
from page_view_t t1
where (
    select datediff(day, max(t2.[TimeStamp]), t1.[TimeStamp])
    from page_view_t t2
    where t2.UserID = t1.UserID and t2.[TimeStamp] < t1.[TimeStamp]
    group by t2.UserID
) >= 365

Upvotes: 1

RealUlysse
RealUlysse

Reputation: 31

The other guy was faster but since I took time to do it and it's a completely different approach I might as well post It :D.

SELECT pv2.VisitID,
       pv2.UserID,  
       pv2.TimeStamp, 
       CASE WHEN pv1.VisitID IS NOT NULL 
             AND pv3.VisitID IS NULL
       THEN 'YES' ELSE 'NO' END AS IsReturnVisit
FROM page_view_t pv2
LEFT JOIN page_view_t pv1 ON pv1.UserID = pv2.UserID
                          AND pv1.VisitID <> pv2.VisitID
                          AND (pv1.TimeStamp <= DATEADD(YEAR, -1, pv2.TimeStamp)
                               OR pv2.TimeStamp <= DATEADD(YEAR, -1, pv1.TimeStamp))
                          AND pv1.pageNum = 1
LEFT JOIN page_view_t pv3 ON pv1.UserID = pv3.UserID
                          AND (pv3.TimeStamp BETWEEN pv1.TimeStamp AND pv2.TimeStamp
                               OR pv3.TimeStamp BETWEEN pv2.TimeStamp AND pv1.TimeStamp)
                          AND pv3.pageNum = 1
WHERE pv2.pageNum = 1

Upvotes: 1

Andriy M
Andriy M

Reputation: 77687

You could rank every user's rows, then join the ranked row set to itself to compare adjacent rows:

;
WITH ranked AS (
  SELECT
    *,
    rnk = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY TimeStamp)
  FROM page_view_t
),
flagged AS (
  SELECT
    *,
    IsReturnVisit = CASE
      WHEN EXISTS (
        SELECT *
        FROM ranked
        WHERE UserID = r.UserID
          AND rnk = r.rnk - 1
          AND TimeStamp <= DATEADD(YEAR, -1, r.TimeStamp)
      )
      THEN 'Y'
      ELSE 'N'
    END
  FROM ranked r
)
SELECT
  VisitID,
  UserID,
  TimeStamp,
  IsReturnVisit
FROM flagged

Note: the above flags only return visits.

UPDATE

To flag the first visits same as return visits, the flagged CTE could be modified as follows:

…
SELECT
  *,
  IsFirstOrReturnVisit = CASE
    WHEN p.UserID IS NULL OR r.TimeStamp >= DATEADD(YEAR, 1, p.TimeStamp)
    THEN 'Y'
    ELSE 'N'
  END
FROM ranked r
  LEFT JOIN ranked p ON r.UserID = p.UserID AND r.rnk = p.rnk + 1
…

References that might be useful:

Upvotes: 5

Related Questions