Reputation: 2455
The original dataset is like:
sub month y
1 1 1
1 2 2
1 3 3
1 5 5
What I want is to get the previous 3 ys based on month for each subject, if y is missing at that month, then the new variable is .
too. Say , for the above example, lag1 is previous y
for last month, lag2
is previous y for 2 months ago, the lag3
is so far so forth:
sub month y lag1 lag2 lag3
1 1 1 . . .
1 2 2 1 . .
1 3 3 2 1 .
1 5 5 . 3 2
The thing is I checked out lag
and dif
function, but in my case, what I want on lag
depends on month
and there also are gaps between month, so I can not use the previous one like lag1
function.
Also I need to do this for many subjects too. Thanks.
Upvotes: 0
Views: 797
Reputation: 63434
SQL solution:
data have;
input sub month y;
datalines;
1 1 1
1 2 2
1 3 3
1 5 5
;;;;
run;
proc sql;
create table want as
select H.sub,H.month, H.y, One.y as lag1, Two.y as lag2, Three.y as lag3
from have H
left join (select * from have) One on H.sub=One.sub and H.month=One.month+1
left join (select * from have) Two on H.sub=two.sub and H.month=Two.month+2
left join (select * from have) Three on H.sub=three.sub and H.month=Three.month+3
;
quit;
Obviously this gets a bit long if you want 36 of them, but it's not all that complex at least. There are all sorts of other ways to do this. Don't use LAG, that's going to be a headache and not really appropriate anyway. A hash table might be more efficient and require less coding, if you're familiar with the concept of a hash.
Hash solution:
data want;
if _n_ = 1 then do;
declare hash h(dataset:'have(rename=y=ly)');
h.defineKey('sub','month');
h.defineData('ly');
h.defineDone();
call missing(sub,month,ly);
end;
set have;
array lags lag1-lag3;
do prevmonth = month-1 to month-3 by -1;
if prevmonth le 0 then leave;
rc=h.find(key:sub,key:prevmonth);
if rc=0 then lags[month-prevmonth] = ly;
call missing(ly);
end;
run;
This is pretty simple to expand up to 36 [or whatever] - just change the length of the array to array lags lag1-lag36
and the do statement do prevmonth=month-1 to month-36 by -1;
The most work you may have to do is arrange things so month works here - either by creating a integer month, or changing the loop criteria to work with month/year or whatnot. You don't show how your data is specified so can't help there.
Upvotes: 2