viji
viji

Reputation: 477

SAS-Replicating a simple excel formula of using previous line items(Subtracting)

I am trying to automate a process which is done in excel. I have a data set like below

data test;
input months count_id;
datalines;
0 0
1 787
2 1283
3 1258
4 1579
5 1695
6 1831
7 2060
8 2337
9 2384
;
run;

enter image description here

I want a new column to be created with the value for months=0 and 1 to be the sum of all the values. But for months 2 and below to be like the formula shown. How do I achieve this. I intially thought of splitting the data set months>=2 but realised I have to use the value of month 1 for it.

Upvotes: 1

Views: 285

Answers (2)

david25272
david25272

Reputation: 974

First you need to get the total (I have used proc sql to store the total in a macro variable). Then you just use lag to access the previous row's variables.

proc sql;
 select sum(count_id) into :total from test;
quit;

data result;
 set test;
 retain sum;
 if months < 2 then sum = &total;
 sum = sum(sum, -lag(count_id));
run;

Upvotes: 0

Tom
Tom

Reputation: 51566

First get the total. So you could use PROC SUMMARY for that.

proc summary data=test ;
  var count_id ;
  output out=total sum=sum ;
run;

Now process the data. Using the LAG() function will get the previous value of COUNT_ID, but the LAG() function just creates a stack of values so if you skip executing it for some observations then those values never make it into the stack. So you need to execute the LAG() function for every value for it to work properly. Since you are pulling the SUM variable from a dataset it will automatically be retained across iterations of the data step, so there is no need to "lag" it. The current value will already be the value that it had from the previous data step iteration, so just subtract the lagged count.

data want ;
  set test ;
  if _n_=1 then set total (keep=sum);
  lag_count = lag(count_id);
  if months > 1 then sum = sum - lag_count ;
run;

If the reason you want the first two values to be the same is just because the first value of COUNT_ID is zero then you can simplify even more and eliminate the need to save the LAG() value into a separate variable. Use the sum(,) function to handle the fact that lag(count_id) will be missing on the first pass of the data step.

data want ;
  set test ;
  if _n_=1 then set total (keep=sum);
  sum = sum(sum, - lag(count_id)) ;
run;

Upvotes: 1

Related Questions