Reputation: 13
I have a dataset with results for over 120 different rows. There are many columns and rows, but I just want to add together the values of 2 rows within the same column.
Here is an example of what some of the most important columns look like and the output I'm trying to achieve (**edited after receiving 2 answers that didn't do exactly what I needed and to more accurately reflect what I'm looking for):
data All;
input Sample$ Matrix$ Var$ Result Units$;
datalines;
S01 Blood A 10 ng/mL
S01 Blood B 12 ng/mL
S01 Blood C 14 ng/mL
S01 Blood D 4 ng/mL
S01 Urine D 6 ng/mL
;
run;
I want the output to look like:
Sample Matrix Var Result Units
S01 Blood A+C 24 ng/mL
S01 Blood B 12 ng/mL
S01 Blood D 4 ng/mL
S01 Urine D 6 ng/mL
As mentioned, there are up to 130 observations and more than 10 variables, so I don't want to transpose the whole dataset (something we've done for other manipulations of this dataset). I've seen mention of using the retain statement, but I don't know if that will work for this case.
Upvotes: 1
Views: 678
Reputation: 51611
Just use a format that collapses A and B into A+B.
proc format ;
value $collapse
'A','B' ='A+B'
;
run;
proc summary data=all nway ;
class var;
format var $collapse.;
var result1-result3;
output out=want sum= ;
run;
Result
Obs Var _TYPE_ _FREQ_ Result1 Result2 Result3
1 A+B 1 2 22 60 110
2 C 1 1 14 16 200
3 D 1 1 4 120 140
For your updated question just use PROC SUMMARY only on the observations you want to SUM and then combine it back with the original data (without those observations).
data All;
input Sample $ Matrix $ Var $ Result Units $;
datalines;
S01 Blood A 10 ng/mL
S01 Blood B 12 ng/mL
S01 Blood C 14 ng/mL
S01 Blood D 4 ng/mL
S01 Urine D 6 ng/mL
;
proc summary data=all ;
where var in ('A' 'B');
var result ;
output out=ab(drop=_type_ _freq_) sum=
idgroup(out[1] (Sample Matrix Units)=)
;
run;
data want;
set all(where=(not (var in ('A' 'B'))))
ab(in=in1);
;
if in1 then Var='A+B';
run;
Results
Obs Sample Matrix Var Result Units
1 S01 Blood C 14 ng/mL
2 S01 Blood D 4 ng/mL
3 S01 Urine D 6 ng/mL
4 S01 Blood A+B 22 ng/mL
If you have more variables you want summed add them to the VAR statement. If you have more variables you want copied then add them into the IDGROUP clause of the OUTPUT statement.
In your real data probably some of those variables, like SAMPLE and MATRIX, should be BY variables (or additional CLASS variables) in the SUMMARY step.
If you have multiple sets of values that need to be collapsed you could still use the format idea, only adjust it so it can also be used to help with the WHERE clause.
proc format ;
value $collapse
'A','B' = 'A+B'
'E','F' = 'E+F'
other = ' '
;
run;
proc summary data=all nway;
where put(var,$collapse.) ne ' ';
by Sample Matrix;
class var ;
format var $collapse.;
var result ;
output out=ab(drop=_type_ _freq_) sum=
idgroup(out[1] (Units)=)
;
run;
data want;
set all(where=(put(var,$collapse.) eq ' '))
ab(in=in1);
;
by Sample Matrix;
if in1 then Var=put(var,$collapse.);
format var ;
run;
Upvotes: 1
Reputation: 12909
It can be done with a data step using retain
. SAS is inherently a looping language when the set
statement is used and will reset the value of variables at each row, then populate them with values it reads in. retain
prevents a variable from being reset so it is carried forward until it is changed again.
On row C, we'll rename var
to A+C
then sum the result
and the retained (stored) value from A
.
data want;
set all;
retain a_result;
/* Store the value of 'result' when var = 'A' */
if(var = 'A') then a_result = result;
/* Do not continue or output if var = 'A'. That is,
if var is not equal to 'A', then run the code below
and output. */
if(var NE 'A');
/* On row C, sum the result + the stored result when var = 'A'*/
if(var = 'C') then do;
var = 'A+C';
result = result + a_result;
end;
drop i a_result;
run;
proc sort data=want;
by sample var matrix;
run;
Sample Matrix Var Result Units
S01 Blood A+C 24 ng/mL
S01 Blood B 12 ng/mL
S01 Blood D 4 ng/mL
S01 Urine D 6 ng/mL
Upvotes: 0