NebulousReveal
NebulousReveal

Reputation: 572

Difference in dates in SAS by group

Consider a data set test in the following form:

 Group   Date
  1      05JAN2014
  1      08JAN2014
  1      14JAN2014
  2      05JAN2013
  2      10FEB2015
  2      27FEB2015

I want to calculate the difference in dates based on group. The code below takes the difference between every two dates:

  data test;
  datediff = dif(Date);
  run;

How would I only take the difference between dates in one group? Moreover, is there a way to take a difference between the last and first dates in each group?

Upvotes: 3

Views: 5476

Answers (3)

Joe
Joe

Reputation: 63434

Starting with this:

 data test;
  datediff = dif(Date);
  run;

Let's address your isues one at a time. First, adding a set statement and a by statement, we can also add first and last to allow you to determine where you are in the group. This assumes it's already sorted by group.

data test2;
  set test;
  by group;
  datediff=dif(date);
run;

This doesn't work any differently (assuming you had the set statement originally, anyway). But now, you have some new options.

First, while you can use dif, I recommend the retain method for this. You can more easily see what it's doing, and avoid some common pitfalls: particularly, lag and dif don't actually compare to a previous record - they create a queue and compare to that, which can lead to complications when using conditional statements.

data test2;
  set test;
  by group;
  retain last_date;
  if first.group then last_date=0;
  datediff = date - last_date;
  output;
  last_date = date;
run;

This does the same thing as before - compares the previous value to the current value - but makes it a bit easier to see, and we add in an option to reset the last_date variable when first.group is true - meaning we're on the first row of a new value of group. I won't drop any of these intermediate variables, but in production code you can and should. retain means the value will persist across rows (instead of being reset every time you get a new row).

Now that you have one variable tracking the previous row's value of date, it's pretty easy hopefully to see how we can also do this for the first->last difference.

data test2;
  set test;
  by group;
  retain last_date orig_date;
  if first.group then do;
    last_date=0;
    orig_date=date;  **new;
  end;
  datediff = date - last_date; 
  if last.group then group_datediff = date-orig_date;  **new;
  output;
  last_date = date;
run;

Now we've done the same thing as before - but we're resetting the orig_date each time we see first.group and calculating group_datediff when we hit last.group.

Upvotes: 6

Tim Sands
Tim Sands

Reputation: 1068

Here is a method using lag and the new ifn function, (ifn is included in SAS 9.2 or later). Be a little careful with the lag function, as it can sometimes produce some unexpected results. See this paper for more detail on it.

*Data must be sorted to use BY groups;
proc sort data=have; by group date;run;

data want;
    set have;
    by group;
    dateDiff = ifn(first.group, . , dif(date));

    retain firstDate;
    if first.group then firstDate = date;
    if last.group then dateDiff_all = date - firstDate;
run;

Upvotes: 3

KnowYourOnion
KnowYourOnion

Reputation: 201

Here is a solution using proc sql and a group by statement with summary functions min and max.

This will get you the difference in each group for the first and last dates.

Technically it will get you the difference in the smallest and largest dates, but you dates are in chronological order so it will work for this data.

proc sql;
    create table want as select
    group,
    max(date) - min(date) as datediff
    from have 
    group by group;
quit;

Upvotes: 2

Related Questions