liuminzhao
liuminzhao

Reputation: 2455

Creating New Variables Conditional on Others [sas]

The original dataset is like:

sub month y
1   1     1
1   2     2
1   3     3
1   5     5 

What I want is to get the previous 3 ys based on month for each subject, if y is missing at that month, then the new variable is . too. Say , for the above example, lag1 is previous y for last month, lag2 is previous y for 2 months ago, the lag3 is so far so forth:

sub month y lag1 lag2 lag3
1   1     1  .    .    .
1   2     2  1    .    .
1   3     3  2    1    .
1   5     5  .    3    2 

The thing is I checked out lag and dif function, but in my case, what I want on lag depends on month and there also are gaps between month, so I can not use the previous one like lag1 function.

Also I need to do this for many subjects too. Thanks.

Upvotes: 0

Views: 797

Answers (1)

Joe
Joe

Reputation: 63434

SQL solution:

data have;
input sub month y;
datalines;
1   1     1
1   2     2
1   3     3
1   5     5 
;;;;
run;

proc sql;
create table want as
    select H.sub,H.month, H.y, One.y as lag1, Two.y as lag2, Three.y as lag3
    from have H
    left join (select * from have) One on H.sub=One.sub and H.month=One.month+1
    left join (select * from have) Two on H.sub=two.sub and H.month=Two.month+2
    left join (select * from have) Three on H.sub=three.sub and H.month=Three.month+3
;
quit;

Obviously this gets a bit long if you want 36 of them, but it's not all that complex at least. There are all sorts of other ways to do this. Don't use LAG, that's going to be a headache and not really appropriate anyway. A hash table might be more efficient and require less coding, if you're familiar with the concept of a hash.

Hash solution:

data want;
if _n_ = 1 then do;
 declare hash h(dataset:'have(rename=y=ly)');
 h.defineKey('sub','month');
 h.defineData('ly');
 h.defineDone();
 call missing(sub,month,ly);
end;
set have;
array lags lag1-lag3;
do prevmonth = month-1 to month-3 by -1;
  if prevmonth le 0 then leave;
  rc=h.find(key:sub,key:prevmonth);
  if rc=0 then lags[month-prevmonth] = ly;
  call missing(ly);
end;
run;

This is pretty simple to expand up to 36 [or whatever] - just change the length of the array to array lags lag1-lag36 and the do statement do prevmonth=month-1 to month-36 by -1; The most work you may have to do is arrange things so month works here - either by creating a integer month, or changing the loop criteria to work with month/year or whatnot. You don't show how your data is specified so can't help there.

Upvotes: 2

Related Questions