How to Count Distinct for SAS PROC SQL with Rolling Date Window?

Question

I have a SAS dataset that looks like the one below:

MEMBER_ID    VAR1    VAR2    DATE
1            12      5       01/04/2020
1            12      5       02/06/2020
1            16      5       04/14/2020
1            12      7       09/10/2020
2            10      5       02/20/2020
2            10      6       04/20/2020
2            10      5       04/25/2020
2            10      5       05/15/2020
3            15      3       01/15/2020
3            16      4       01/25/2020
4            10      5       05/15/2020
5            11      7       03/03/2020
5            12      8       04/03/2020
5            13      9       05/03/2020

My goal is to count the distinct values in VAR1 and VAR2 grouped by MEMBER_ID and a rolling date range of 180 days. So if the date in row 2 is within the 180 days of row 1 for member 1, then they will be counted (distinctly). My current code looks as follows:

PROC SQL;
   CREATE TABLE WORK.WANT AS 
   SELECT t1.MEMBER_ID, 
          t1.VAR1, 
          t1.VAR2, 
          t1.DATE, 
          /* var1Count */
            (COUNT(DISTINCT(t1.VAR1))) FORMAT=10. LABEL="var1Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var1Count , 
          /* var2Count */
            (COUNT(DISTINCT(t1.VAR2))) FORMAT=10. LABEL="var2Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var2Count , 
      FROM WORK.HAVE t1
      GROUP BY t1.MEMBER_ID
      HAVING (CALCULATED var1Count ) >= 2 AND (CALCULATED var2Count ) >= 2 
      ORDER BY t1.MEMBER_ID,
               t1.DATE;
QUIT;

But while I think this WHERE statement in the column calculation may work for regular SQL code, it's giving me errors here. Any other ideas? It may be that I need to do this COUNT(DISTINCT VAR) in a different SAS data step, but I'm unsure (and fairly new to SAS for that matter). Any help at all is greatly appreciated!

Gordon Linoff · Accepted Answer

I think you need to use correlated subqueries for this in SAS:

SELECT h.* ,
       (SELECT COUNT(DISTINCT h2.VAR1)
        FROM WORK.HAVE h2
        WHERE h2.MEMBER_ID = h.MEMBER_ID AND
              h2.DATE BETWEEN h.DATE - 180 AND h.DATE
       ) as var1count,
       (SELECT COUNT(DISTINCT h2.VAR2)
        FROM WORK.HAVE h2
        WHERE h2.MEMBER_ID = h.MEMBER_ID AND
              h2.DATE BETWEEN h.DATE - 180 AND h.DATE
       ) as var2count
FROM WORK.HAVE h;

If you want to filter on the counts, you can use a subquery.

How to Count Distinct for SAS PROC SQL with Rolling Date Window?

Answers (1)

Related Questions