viji
viji

Reputation: 477

sas expanding the code to multiple columns using lag function and performing an operation

My dataset is like this

bucket  D_201009    D_201010    D_201011    D_201012    D_201101    D_201102    D_201103
0   0   0   0   0   0   0   0
1   1   0   0   0   1   0   0
2   3   0   3   0   1   6   3
3   0   0   0   0   0   0   0
4   0   4   0   0   0   0   0
5   4   0   4   0   4   8   1
6   8   0   8   0   8   10  8
7   0   0   0   0   0   0   0
8   7   0   7   0   0   7   3

what I want is this

bucket  D_201009    D_201010    D_201011    D_201012    D_201101    D_201102    D_201103
0   23  4   22  0   14  31  15
1   23  4   22  0   14  31  15
2   22  4   22  0   13  31  15
3   19  4   19  0   12  25  12
4   19  4   19  0   12  25  12
5   19  0   19  0   12  25  12
6   15  0   15  0   8   17  11
7   7   0   7   0   0   7   3
8   7   0   7   0   0   7   3

where the sum is the value for bucket 0 and 1 row the corresponding bucket 2 for column D_201009 =sum-original value(1) and later for bucket 3 for column D_201009 previous value(lag value) -3(value original) and label this column as original column name. I wrote the code to perform one column.

data test;
input bucket D_201009 D_201010 D_201011 D_201012 D_201101 D_201102 D_201103;
datalines;
0 0 0 0 0 0 0 0
1 1 0 0 0 1 0 0
2 3 0 3 0 1 6 3
3 0 0 0 0 0 0 0
4 0 4 0 0 0 0 0
5 4 0 4 0 4 8 1
6 8 0 8 0 8 10 8
7 0 0 0 0 0 0 0
8 7 0 7 0 0 7 3
;
run;

Saving these column names in a macro

proc contents data = test
out = vars(keep = varnum name)
noprint;
run; 
proc sql noprint;
select distinct name
into :orderedvars2 separated by ' '
from vars
where varnum >=2
order by varnum;
quit;

Finding sum of one column only

proc sql;
select sum(D_201009) into :total from test;
quit;

Using lag to perform

data result(drop= D_201009 lag_D_201009 rename=(sum=D_201009));
set test;
retain sum;
if bucket < 2 then sum = &total;
sum = sum(sum, -lag(D_201009));
run;

how do I change the code to work for all columns where the column names are stored as macro &orderedvars2. ?

Upvotes: 1

Views: 898

Answers (3)

Tom
Tom

Reputation: 51601

If I understand this right you want to sum the column and then subtract the value of each observation from the total?

Getting totals is easy, just use proc summary.

Then combine it with the original data. Here is a way that will work without having to worry about the actual variable names. In this program it will sum all variables that start with d_ but you could use any variable list you want. If you have more than 100 variables then change the dimension of the temporary array.

%let varlist=d_:;

* Get sums into variables with same names ;
proc summary data=have ;
  var &varlist ;
  output out=total sum=  ;
run;

data want ;
  set have(obs=0) /* Set variable order */
      total(keep=&varlist) /* Get totals */
      have(keep=&varlist)  /* Get lagged variables */
  ;
  array vars &varlist ;
  array total (100) _temporary_;
  set have (drop=&varlist);  /* Get non-lagged variables */
  do i=1 to dim(vars);
    if _n_>1 then vars(i)=total(i)-vars(i);
    total(i)=vars(i);
  end;
  drop i;
run;

If you have missing values you might want to add this line of code at beginning of the DO loop:

vars(i)=coalesce(vars(i),0);

Upvotes: 0

Joe
Joe

Reputation: 63424

The way I'd approach it would be to transpose the data structure to a more useful data structure; then you don't have to use macro variables. You can use BY processing instead, and no lags.

The way I create the final output is to transpose the initial dataset so you have one row per bucket/D_var, then sort by the D_vars (_NAME_ holds that). Then use a Double DoW loop in order to first calculate the sum, and then to subtract the value. Note I don't have to use Retain or Lag here, I can just directly operate on the value since I'm in a DoW loop. I output before subtracting since that's what you seem to want. Then I retranspose back.

This might not be the fastest option if you have very large data, since it goes through several steps; if you do, you should be using a more efficient algorithm anyway. But it's likely the least fiddly if you don't always have the same columns.

proc transpose data=test out=test_t;
by bucket;
run;

proc sort data=test_t;
  by _name_ bucket;
run;

data want_t;
  do _n_ = 1 by 1 until (last._name_);
    set test_t;
    by _name_ bucket;
    sum_var = sum(sum_var,col1);
  end;
  do _n_ = 1 by 1 until (last._name_);
    set test_t;
    by _name_ bucket;
    output;
    sum_var = sum_var - col1;
  end;
run;

proc sort data=want_t;
by bucket _name_;
run;
proc transpose data=want_t out=want;
  by bucket;
  id _name_;
  var sum_var;
run;

Upvotes: 2

Shenglin Chen
Shenglin Chen

Reputation: 4554

Use proc summary to get sum of each variable, then define multiple arrays.

proc summary data=test;
var D:;
output out=sum(drop=_:) sum=/autoname;
run;

data want;
   set test;
   if _n_=1 then set sum;
   array var1 D_201009--D_201103;
   array var2 D_201009_sum--D_201103_sum;
   array var3 _D_201009 _D_201010 _D_201011 _D_201012 _D_201101 _D_201102 _D_201103;
   array temp (7) _temporary_;
   retain temp;
   do i=1 to dim(var1);
      lag=lag(var1(i));
      if bucket<2 then var3(i)=var2(i);
      else var3(i)=sum(temp(i),-lag);
      temp(i)=var3(i);
   end;
   drop D: lag i;
run;

Upvotes: 0

Related Questions