Reputation: 477
My dataset is like this
bucket D_201009 D_201010 D_201011 D_201012 D_201101 D_201102 D_201103
0 0 0 0 0 0 0 0
1 1 0 0 0 1 0 0
2 3 0 3 0 1 6 3
3 0 0 0 0 0 0 0
4 0 4 0 0 0 0 0
5 4 0 4 0 4 8 1
6 8 0 8 0 8 10 8
7 0 0 0 0 0 0 0
8 7 0 7 0 0 7 3
what I want is this
bucket D_201009 D_201010 D_201011 D_201012 D_201101 D_201102 D_201103
0 23 4 22 0 14 31 15
1 23 4 22 0 14 31 15
2 22 4 22 0 13 31 15
3 19 4 19 0 12 25 12
4 19 4 19 0 12 25 12
5 19 0 19 0 12 25 12
6 15 0 15 0 8 17 11
7 7 0 7 0 0 7 3
8 7 0 7 0 0 7 3
where the sum is the value for bucket 0 and 1 row the corresponding bucket 2 for column D_201009 =sum-original value(1) and later for bucket 3 for column D_201009 previous value(lag value) -3(value original) and label this column as original column name. I wrote the code to perform one column.
data test;
input bucket D_201009 D_201010 D_201011 D_201012 D_201101 D_201102 D_201103;
datalines;
0 0 0 0 0 0 0 0
1 1 0 0 0 1 0 0
2 3 0 3 0 1 6 3
3 0 0 0 0 0 0 0
4 0 4 0 0 0 0 0
5 4 0 4 0 4 8 1
6 8 0 8 0 8 10 8
7 0 0 0 0 0 0 0
8 7 0 7 0 0 7 3
;
run;
proc contents data = test
out = vars(keep = varnum name)
noprint;
run;
proc sql noprint;
select distinct name
into :orderedvars2 separated by ' '
from vars
where varnum >=2
order by varnum;
quit;
proc sql;
select sum(D_201009) into :total from test;
quit;
data result(drop= D_201009 lag_D_201009 rename=(sum=D_201009));
set test;
retain sum;
if bucket < 2 then sum = &total;
sum = sum(sum, -lag(D_201009));
run;
how do I change the code to work for all columns where the column names are stored as macro &orderedvars2. ?
Upvotes: 1
Views: 898
Reputation: 51601
If I understand this right you want to sum the column and then subtract the value of each observation from the total?
Getting totals is easy, just use proc summary
.
Then combine it with the original data. Here is a way that will work without having to worry about the actual variable names. In this program it will sum all variables that start with d_
but you could use any variable list you want. If you have more than 100 variables then change the dimension of the temporary array.
%let varlist=d_:;
* Get sums into variables with same names ;
proc summary data=have ;
var &varlist ;
output out=total sum= ;
run;
data want ;
set have(obs=0) /* Set variable order */
total(keep=&varlist) /* Get totals */
have(keep=&varlist) /* Get lagged variables */
;
array vars &varlist ;
array total (100) _temporary_;
set have (drop=&varlist); /* Get non-lagged variables */
do i=1 to dim(vars);
if _n_>1 then vars(i)=total(i)-vars(i);
total(i)=vars(i);
end;
drop i;
run;
If you have missing values you might want to add this line of code at beginning of the DO loop:
vars(i)=coalesce(vars(i),0);
Upvotes: 0
Reputation: 63424
The way I'd approach it would be to transpose the data structure to a more useful data structure; then you don't have to use macro variables. You can use BY processing instead, and no lags.
The way I create the final output is to transpose the initial dataset so you have one row per bucket/D_var, then sort by the D_vars (_NAME_ holds that). Then use a Double DoW loop in order to first calculate the sum, and then to subtract the value. Note I don't have to use Retain or Lag here, I can just directly operate on the value since I'm in a DoW loop. I output before subtracting since that's what you seem to want. Then I retranspose back.
This might not be the fastest option if you have very large data, since it goes through several steps; if you do, you should be using a more efficient algorithm anyway. But it's likely the least fiddly if you don't always have the same columns.
proc transpose data=test out=test_t;
by bucket;
run;
proc sort data=test_t;
by _name_ bucket;
run;
data want_t;
do _n_ = 1 by 1 until (last._name_);
set test_t;
by _name_ bucket;
sum_var = sum(sum_var,col1);
end;
do _n_ = 1 by 1 until (last._name_);
set test_t;
by _name_ bucket;
output;
sum_var = sum_var - col1;
end;
run;
proc sort data=want_t;
by bucket _name_;
run;
proc transpose data=want_t out=want;
by bucket;
id _name_;
var sum_var;
run;
Upvotes: 2
Reputation: 4554
Use proc summary to get sum of each variable, then define multiple arrays.
proc summary data=test;
var D:;
output out=sum(drop=_:) sum=/autoname;
run;
data want;
set test;
if _n_=1 then set sum;
array var1 D_201009--D_201103;
array var2 D_201009_sum--D_201103_sum;
array var3 _D_201009 _D_201010 _D_201011 _D_201012 _D_201101 _D_201102 _D_201103;
array temp (7) _temporary_;
retain temp;
do i=1 to dim(var1);
lag=lag(var1(i));
if bucket<2 then var3(i)=var2(i);
else var3(i)=sum(temp(i),-lag);
temp(i)=var3(i);
end;
drop D: lag i;
run;
Upvotes: 0