With a SAS dataset like Ob x year pid grp 1 3.88 2001 1 a 2 2.88 2002 1 a 3 0.13 2004 1 a 4 3.70 2005 1 a 5 1.30 2007 1 a 6 0.95 2001 2 b 7 1.79 2002 2 b 8 1.59 2004 2 b 9 1.29 2005 2 b 10 0.96 2007 2 b I would like to get Ob x year pid grp grp X_F1 XL1 1 3.88 2001 1 a a 2.88 . 2 2.88 2002 1 a a . 3.88 3 0.13 2004 1 a a 3.7 . 4 3.7 2005 1 a a . 0.13 5 1.3 2007 1 a a . . 6 0.95 2001 2 b b 1.79 . 7 1.79 2002 2 b b . 0.95 8 1.59 2004 2 b b 1.29 . 9 1.29 2005 2 b b . 1.59 10 0.96 2007 2 b b . . where for observations with the same pid and each year t , x_F1 is the value of x in year t+1 and x_L1 is the value of x in year t-1 In my data set, not all pid s have observations in successive years. My attempt using the expand proc proc expand data=have out=want method=none; by pid; id year; convert x = x_F1 / transformout=(lead 1); convert x = x_F2 / transformout=(lead 2); convert x = x_F3 / transformout=(lead 3); convert x = x_L1 / transformout=(lag 1); convert x = x_L2 / transformout=(lag 2); convert x = x_L3 / transformout=(lag 3); run; did not account for the fact that years are not consecutive.

Reputation: 735

Generating Leads and lags for non-consecutive time periods in SAS

With a SAS dataset like

Ob     x    year   pid  grp 
1    3.88   2001    1    a
2    2.88   2002    1    a 
3    0.13   2004    1    a 
4    3.70   2005    1    a 
5    1.30   2007    1    a 
6    0.95   2001    2    b 
7    1.79   2002    2    b 
8    1.59   2004    2    b 
9    1.29   2005    2    b 
10   0.96   2007    2    b

I would like to get

Ob   x      year    pid grp grp   X_F1      XL1
1   3.88    2001    1   a    a   2.88        .
2   2.88    2002    1   a    a     .        3.88
3   0.13    2004    1   a    a    3.7        .
4   3.7     2005    1   a    a     .        0.13
5   1.3     2007    1   a    a     .         .
6   0.95    2001    2   b    b    1.79       .
7   1.79    2002    2   b    b     .        0.95
8   1.59    2004    2   b    b    1.29       .
9   1.29    2005    2   b    b     .        1.59
10  0.96    2007    2   b    b     .         .

where for observations with the same pid and each year t,

x_F1 is the value of x in year t+1 and
x_L1 is the value of x in year t-1

In my data set, not all pids have observations in successive years.

My attempt using the expand proc

proc expand data=have out=want method=none;
    by pid; id year;
    convert x = x_F1 / transformout=(lead 1);
    convert x = x_F2 / transformout=(lead 2);
    convert x = x_F3 / transformout=(lead 3);
    convert x = x_L1 / transformout=(lag 1);
    convert x = x_L2 / transformout=(lag 2);
    convert x = x_L3 / transformout=(lag 3);
run;

did not account for the fact that years are not consecutive.

Upvotes: 1

Answers (3)

Longfish

Reputation: 7602

You could stick with proc expand to insert the missing years into your data (utilising the extrapolate statement). I've set the from value to day as this is a sequential integer check for days which will work with your data as YEAR is stored as an integer rather than a date.

Like the other answers, it requires 2 passes of the data, but I don't think there's an alternative to this.

data have;
input x year pid grp $;
datalines;
3.88   2001    1    a
2.88   2002    1    a 
0.13   2004    1    a 
3.70   2005    1    a 
1.30   2007    1    a 
0.95   2001    2    b 
1.79   2002    2    b 
1.59   2004    2    b 
1.29   2005    2    b 
0.96   2007    2    b 
;
run;

proc expand data = have out = have1
    method=none extrapolate
    from=day to=day;
by pid;
id year;
run;

proc expand data=have1 out=want method=none;
    by pid; id year;
    convert x = x_F1 / transformout=(lead 1);
    convert x = x_F2 / transformout=(lead 2);
    convert x = x_F3 / transformout=(lead 3);
    convert x = x_L1 / transformout=(lag 1);
    convert x = x_L2 / transformout=(lag 2);
    convert x = x_L3 / transformout=(lag 3);
run;

or this can be done in one go, subject to whether the value of x is important in the final dataset (see comment below).

proc expand data=have1 out=want1 method=none extrapolate from=day to=day;
    by pid; id year;
    convert x = x_F1 / transformout=(lead 1);
    convert x = x_F2 / transformout=(lead 2);
    convert x = x_F3 / transformout=(lead 3);
    convert x = x_L1 / transformout=(lag 1);
    convert x = x_L2 / transformout=(lag 2);
    convert x = x_L3 / transformout=(lag 3);
run;

Upvotes: 3

SRSwift

Reputation: 1710

Here is a simple approach using proc sql. It joins the data with itself twice; once for the forward and once for the backward lag, then takes the required values where they exist.

proc sql;
    create table want as
    select
        a.*,
        b.x as x_f1,
        c.x as x_l1
    from have as a
    left join have as b
        on a.pid = b.pid and a.year = b.year - 1
    left join have as c
        on a.pid = c.pid and a.year = c.year + 1
    order by 
        a.pid, 
        a.year;
run;

Caveats:

It will not expand too well with larger numbers of lags.
This is probably not the quickest approach.
It requires that there be only one observation for each pid year pair, and would need modifying if this is not the case.

Upvotes: 2

stallingOne

Reputation: 4006

Sort your data per group and per year.
compute x_F1 in a data step with a lag and a condition like this: if (year and lag(year) are consecutive) then x_F1=lag(x)
Sort your date the other way around
Compute x_L1 similarly.

I'm trying to write you a working code right now. If you provide me with a data sample (a data step with an infile e.g.), I can better try and test it.

This seems to work with my data:

/*1*/
proc sort data=WORK.QUERY_FOR_EPILABO_CLEAN_NODUP out=test1(where=(year<>1996)) nodupkey;
    by grp year;
run;

quit;

/*2*/
data test2;
    *retain x;
    set test1;
    by grp;
    x_L1=lag(x);
    if first.grp then
        x_L1=.;
    yeardif=dif(year);
    if (yeardif ne 1) then
        x_L1=.;
run;

/*3*/
proc sort data=test2(drop=yeardif) out=test3;
    by grp descending year;
run;

quit;

/*4*/
data test4;
    *retain x;
    set test3;
    by grp;
    x_F1=lag(x);
    if first.grp then
        x_F1=.;
    yeardif=dif(year);
    if (yeardif ne -1) then
        x_F1=.;
run;

Upvotes: 1

Generating Leads and lags for non-consecutive time periods in SAS

Answers (3)

Related Questions