DoubleT
DoubleT

Reputation: 13

Calculate a date difference within a group variable, over rows of missing observations

In SAS, I have ID, Date1, and Date2 sorted by ascending ID, Date1, and Date2. The sort causes the missing Date2 values to be where they are, as desired. How can I calculate the Date2 difference between rows with valid dates and obtain the results displayed in D_Date2?

In words it is: BY ID, skip a missing date value in Date2, read the next valid date under it, subtract the earlier date from the later, and write the difference as D_Date2 to the row that has the valid Date2 value. Thanks.

Obs ID  Date1      Date2        D_Date2
1   1   20090815   20090818       .
2   1   20090815   20090818       0
3   1   20090816   20090820       2
4   1   20090816          .       .
5   1   20090816   20090820       0
6   2   20090101          .       .
7   2   20090105   20090105       .
8   2   20090105          .       .
9   2   20090105   20090106       1
10  2   20090105   20090110       4
11  3   20080720          .       .
12  3   20080720   20080917       .
13  3   20080720   20080918       1
14  3   20081010          .       .
15  3   20081010   20080925       7
16  3   20081010   20080925       0

Upvotes: 0

Views: 3982

Answers (1)

DWal
DWal

Reputation: 2762

I'm sure you could use retain, but I'd use the lag function. The key here is to understand that the lag function does not necessarily return the value from the previous row. If it follows an if condition, the lag function returns the value from the last row where the condition was true.

I like doing these things step-by-step for clarity. First I create a new variable ldate2 that contains the date that is to be subtracted to get the desired difference, then I perform the subtraction.

data want;
  set have;
  if not missing(date2) then do;
    ldate2 = lag(date2);
    if id ne lag(id) then ldate2 = .;
  end;
  d_date2 = date2 - ldate2;
run;

As rbet suggests, using a dif function is simpler. dif behaves like lag, except it subtracts the previous value from the current value, so there's no need to perform the subtraction separately:

data want;
  set have;
  if not missing(date2) then do;
    d_date2 = dif(date2);
    if id ne lag(id) then d_date2 = .;
  end;
run;

Upvotes: 1

Related Questions