Reputation: 468
I have a SAS dataset that looks like the one below:
MEMBER_ID VAR1 VAR2 DATE
1 12 5 01/04/2020
1 12 5 02/06/2020
1 16 5 04/14/2020
1 12 7 09/10/2020
2 10 5 02/20/2020
2 10 6 04/20/2020
2 10 5 04/25/2020
2 10 5 05/15/2020
3 15 3 01/15/2020
3 16 4 01/25/2020
4 10 5 05/15/2020
5 11 7 03/03/2020
5 12 8 04/03/2020
5 13 9 05/03/2020
My goal is to count the distinct values in VAR1
and VAR2
grouped by MEMBER_ID
and a rolling date range of 180 days. So if the date in row 2 is within the 180 days of row 1 for member 1, then they will be counted (distinctly). My current code looks as follows:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT t1.MEMBER_ID,
t1.VAR1,
t1.VAR2,
t1.DATE,
/* var1Count */
(COUNT(DISTINCT(t1.VAR1))) FORMAT=10. LABEL="var1Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var1Count ,
/* var2Count */
(COUNT(DISTINCT(t1.VAR2))) FORMAT=10. LABEL="var2Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var2Count ,
FROM WORK.HAVE t1
GROUP BY t1.MEMBER_ID
HAVING (CALCULATED var1Count ) >= 2 AND (CALCULATED var2Count ) >= 2
ORDER BY t1.MEMBER_ID,
t1.DATE;
QUIT;
But while I think this WHERE
statement in the column calculation may work for regular SQL code, it's giving me errors here. Any other ideas? It may be that I need to do this COUNT(DISTINCT VAR)
in a different SAS data
step, but I'm unsure (and fairly new to SAS for that matter). Any help at all is greatly appreciated!
Upvotes: 1
Views: 1533
Reputation: 1270003
I think you need to use correlated subqueries for this in SAS:
SELECT h.* ,
(SELECT COUNT(DISTINCT h2.VAR1)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var1count,
(SELECT COUNT(DISTINCT h2.VAR2)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var2count
FROM WORK.HAVE h;
If you want to filter on the counts, you can use a subquery.
Upvotes: 2