biostatguy12
biostatguy12

Reputation: 659

SAS lag function where only first row of data is known?

this should be very simple but I'm having trouble. I have something like

data data;
   infile datalines delimiter=','; 
   input num    total   N   want;;
   datalines;                      
num total   N   want
633 3000    2367    2367
450 3000    .   1734
22  3000    .   1284
11  3000    .   1262
10  3000    .   1251
9   3000    .   1241
4   3000    .   1232
5   3000    .   1228
1   3000    .   1223
5   3000    .   1222
;
run;

I want to fill in the missing, where I take the previous row N-num to give me my new want. So N1 should be 2367, 1734, 1284 etc.

I try :

data data1;
set data;
if N=. then want=lag(N)-lag(num);
run;

but always get an error, as if it is trying to compute the lag globally rather than starting at the top row then moving thru. Any advice? Maybe even using proc sql?

Upvotes: 0

Views: 439

Answers (1)

Tom
Tom

Reputation: 51601

Sounds like you just want to RETAIN a NEW variable. (RETAIN does not work very well for variables that are coming in from the source dataset since when the next observation is read the retained value is overwritten.)

So start with a dataset with just the two variables.

data have;
   input num total;
datalines;                      
633 3000
450 3000
22  3000
11  3000
10  3000
9   3000
4   3000
5   3000
1   3000
5   3000
;

Now you can create a new variable that subtracts NUM. Let's call this new variable REMAINDER.

data want;
  set have;
  if _n_=1 then remainder=total;
  remainder + -num ;
run;

The SUM statement automatically retains the variable you are summing into.

Result:

Obs    num    total    remainder

  1    633     3000       2367
  2    450     3000       1917
  3     22     3000       1895
  4     11     3000       1884
  5     10     3000       1874
  6      9     3000       1865
  7      4     3000       1861
  8      5     3000       1856
  9      1     3000       1855
 10      5     3000       1850

Or if you would rather subtract the previous value of NUM you could either just write the observations before you do the subtraction.

  output;
  remainder + -num;

Or use the LAG() function to subtract the previous value. LAG() on the first observation will return a missing value which we be treated as zero by the SUM statement.

  remainder + -lag(num) ;

Results:

Obs    num    total    remainder

  1    633     3000       3000
  2    450     3000       2367
  3     22     3000       1917
  4     11     3000       1895
  5     10     3000       1884
  6      9     3000       1874
  7      4     3000       1865
  8      5     3000       1861
  9      1     3000       1856
 10      5     3000       1855

Upvotes: 1

Related Questions