Neal801
Neal801

Reputation: 307

SAS-use of lead function

Suppose the dataset has three columns

Date   Region   Price  
01-03    A        1
01-03    A        2
01-03    B        3
01-03    B        4
01-03    A        5
01-04    B        4
01-04    B        6
01-04    B        7 

I try to get the lead price by date and region through following code.

data want;
set have;
by _ric date_l_;
do until (eof);
set have(firstobs=2 keep=price rename=(price=lagprice)) end=eof;
end;
if last.date_l_ then call missing(lagprice);
run;

However, the WANT only have one observations. Then I create new_date=date and try another code:

data want;
set have nobs=nobs;
  do _i = _n_ to nobs until (new_date ne Date);
  if eof1=0 then
  set have (firstobs=2 keep=price rename=(price=leadprice)) end=eof1;
  else leadprice=.;
  end;
run;

With this code, SAS is working slowly. So I think this code is also not appropriate. Could anyone give some suggestions? Thanks

Upvotes: 1

Views: 2794

Answers (2)

kstats9pt3
kstats9pt3

Reputation: 873

Try sorting by the variables you want lead price for then set together twice:

data test;
length Date   Region  $12 Price 8 ; 
input Date  $ Region $   Price ;
datalines; 
    01-03    A        1
    01-03    A        2
    01-03    B        3
    01-03    B        4
    01-03    A        5
    01-04    B        4
    01-04    B        6
    01-04    B        7 
    ;
run;

** sort by vars you want lead price for **;
proc sort data = test;
    by DATE REGION;
run;

** set together twice -- once for lead price and once for all variables **;
data lead_price;
    set test;
    by DATE REGION;
    set test (firstobs = 2 keep = PRICE rename = (PRICE = LEAD_PRICE))
        test (obs = 1 drop = _ALL_);
    if last.DATE or last.REGION then do; 
        LEAD_PRICE = .;
    end;
run;

Upvotes: 2

Stu Sztukowski
Stu Sztukowski

Reputation: 12909

You can use proc expand to generate leads on numeric variables by group. Try the following method instead:

Step 1: Sort by Region, Date

proc sort data=have;
    by Region Date;
run;

Step 2: Create a new ID variable to denote observation numbers

Because you have multiple values per date per region, we need to generate a new ID variable so that proc expand uses lead by observation number rather than by date.

data have2;
    set have;

    _ID_ = _N_;
run;

Step 3: Run proc expand by region with the lead transformation

lead will do exactly as it sounds. You can lead by as many values as you'd like, as long as the data supports it. In this case, we are leading by one observation.

proc expand data=have2
            out=want;
    by Region;
    id _ID_;
    convert Price = Lead_Price / transform=(lead 1) ;
run;

Upvotes: 0

Related Questions