Neal801
Neal801

Reputation: 307

SAS-identifying three preceding years

I use following code to identify and output three preceding years. To give more details, the sample includes multiple Person ID and each Person ID have observations in multiple years. The final sample will only keep the Person ID who at least have three preceding years (e.g. 2001 2002 2003).

data have3 ;
  set have2;
  by personid;
  set have2 ( firstobs = 2 keep = year rename = (year = _year2) )

      have2 (      obs = 1 drop = _all_                        );

  _year2 = ifn(  last.personid, (.), _year2 ); /*output the value of next year*/

  set have2 ( firstobs = 3 keep = year rename = (year = _year3) )

      have2 (      obs = 2 drop = _all_                        );

  _year3 = ifn(  last.personid, (.), _year3 );  /*output the value of the year after the next year*/

  _prev1 = ifn( first.personid, (.), lag(year) ); /*output the value of previous year*/

  _prev2 = ifn( first.personid, (.), lag2(year) );/*output the value of the year before the previous year*/


  if (year-2 eq _prev1-1 eq _prev2) or

     (year+2 eq _year2+1 eq _year3) or

     (year eq _year2-1 eq _prev1+1) then output;

run;

This code is fine in most situations. However, my sample has some tricky situations. The following figure show one of the situation. Person ID 488 only have two observations (Year 1994 and 1995). Unfortunately, the first year of the next Person ID 489 is 1996. Hence, the _year3 of ID488 Year1994 is 1996 that make year+2 eq _year2+1 eq _year3 become true. As the result, ID488 Year1994 also output to the final sample. How could I improve the code to avoid this situation? Thanks!

enter image description here

Upvotes: 0

Views: 68

Answers (2)

Richard
Richard

Reputation: 27536

A SQL query might be more clear:

%* group has at least 3 years;
create table want as
select *
from have
group by id
having count(distinct year) >= 3

A DOW loop can compute a state variable that indicates the group had a run of 3 years somewhere within

%* group has a run of 3 years somewhere in time;
data want;
  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;
    if _n_ >= 3 and lag2(year)+1 = lag(year) and lag(year)+1 = year then 
      _group_has_3_consective_years = 1;
  end;

  do _n_ = 1 to _n_;
    set have;
    if _group_has_3_consective_years then OUTPUT;
  end;

  drop _:;
run;

A DOW loop can also compute state variables for the more complex case of wanting data from groups having a run of 3 years that ends at the last year of the group.

%* group has a run of 3 years finishing at end time;
data want_3_ending;
  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;
    if _n_ >= 3 and lag2(year)+1 = lag(year) and lag(year)+1 = year then do;
      _group_has_3_consective_years = 1;
      _end_at_year = year;
    end;
  end;

  _top_year = year;

  do _n_ = 1 to _n_;
    set have;
    if _group_has_3_consective_years and _end_at_year = _top_year then OUTPUT;
  end;

  drop _:;
run;

NOTE: The LAG invocations are always resolved because SAS if statements do NOT do shortcut logic evaluation. The _N_ >= 3 ensures the lags are all from the same group.

Upvotes: 0

Tom
Tom

Reputation: 51621

It is probably going to be easier to just keep the id variable also in your look ahead SET statements.

set have2(firstobs=2 keep=personid year rename=(personid=personid2 year=_year2))...
set have2(firstobs=3 keep=personid year rename=(personid=personid3 year=_year3))...

Then you can make sure that the next and next-next records are actually for the same PERSONID.

if personid ne personid2 then _year2=.;
if personid ne personid3 then _year3=.;

Same thing for your look back variables.

_prev1=lag(year);
_prev2=lag2(year);
if personid ne lag(personid) then _prev1=.;
if personid ne lag2(personid) then _prev2=.;

Upvotes: 1

Related Questions