Erin
Erin

Reputation: 81

Within a sas by statement, subtract one observation from its lag

I have a SAS data set grouped by clusters, as follows

data have;
    input cluster date date9.;
cards;
1 1JAN2017
1 2JAN2017
1 7JAN2017
2 1JAN2017
2 3JAN2017
2 10JAN2017
;
run;

Within each cluster, I'd like to subtract a date from it's previous date, so I have the dataset below:

data want;
input cluster date date_diff;
cards;
1 1JAN2017 0
1 2JAN2017 1
1 7JAN2017 5
2 1JAN2017 0
2 3JAN2017 2
2 10JAN2017 7
;
run;

I think perhaps I should be using a lag function similar to what I have written below.

DATA test; 
  SET have; 
   BY cluster; 
   if first.cluster then do;
   date_diff = date - lag(date);
  END; 
RUN; 

Any advice would be appreciated! Thanks

Upvotes: 0

Views: 284

Answers (2)

Joe
Joe

Reputation: 63424

I like dif for this (lag plus subtract in one function). You have the if first backwards, I think, but dif and lag have the same restriction - what they're really doing is building a queue, so the lag or dif statement cannot be conditionally executed for most use cases. Here I flip it around and calculate the dif, then set it to missing if on first.cluster.

I also encourage you to use missing, not 0, for the first.cluster dif.

DATA test; 
  SET have; 
   BY cluster; 
   date_diff = dif(date);
   if first.cluster then call missing(date_diff);
RUN; 

Upvotes: 1

Quentin
Quentin

Reputation: 6378

Conditional lags are tricky. It this case (and in many), you don't actually want a conditional lag. You want to compute the lag for every record, and then you can use it conditionally. One way is:

data want ;
  set have ;
  by cluster ;
  lagdate=lag(date) ;
  if first.cluster then date_diff=0 ;
  else date_diff=date-lagdate ;
run ;

So you compute lagdate for every record. Then you can conditionally compute date_diff.

Upvotes: 0

Related Questions