Reputation: 81
I have a SAS data set grouped by clusters, as follows
data have;
input cluster date date9.;
cards;
1 1JAN2017
1 2JAN2017
1 7JAN2017
2 1JAN2017
2 3JAN2017
2 10JAN2017
;
run;
Within each cluster, I'd like to subtract a date from it's previous date, so I have the dataset below:
data want;
input cluster date date_diff;
cards;
1 1JAN2017 0
1 2JAN2017 1
1 7JAN2017 5
2 1JAN2017 0
2 3JAN2017 2
2 10JAN2017 7
;
run;
I think perhaps I should be using a lag function similar to what I have written below.
DATA test;
SET have;
BY cluster;
if first.cluster then do;
date_diff = date - lag(date);
END;
RUN;
Any advice would be appreciated! Thanks
Upvotes: 0
Views: 284
Reputation: 63424
I like dif
for this (lag plus subtract in one function). You have the if first
backwards, I think, but dif
and lag
have the same restriction - what they're really doing is building a queue, so the lag
or dif
statement cannot be conditionally executed for most use cases. Here I flip it around and calculate the dif
, then set it to missing if on first.cluster
.
I also encourage you to use missing, not 0, for the first.cluster
dif.
DATA test;
SET have;
BY cluster;
date_diff = dif(date);
if first.cluster then call missing(date_diff);
RUN;
Upvotes: 1
Reputation: 6378
Conditional lags are tricky. It this case (and in many), you don't actually want a conditional lag. You want to compute the lag for every record, and then you can use it conditionally. One way is:
data want ;
set have ;
by cluster ;
lagdate=lag(date) ;
if first.cluster then date_diff=0 ;
else date_diff=date-lagdate ;
run ;
So you compute lagdate for every record. Then you can conditionally compute date_diff.
Upvotes: 0