Reputation: 47
Thanks in advance for any and all help here.
An example data set and desired output is linked below.
I want to compute new variables from the "Have" data set as follows: RE: the average of "R" values within a given "Cat" variable value excluding the specific observation IE the average of "I" responses within a given "Cat" variable value excluding the specific observation
Upvotes: 1
Views: 608
Reputation: 47
Thanks both of you, the left join works, but may present problems with missing data. As such, this appears to be the best solution (thanks to PGStats on SAS Communities):
proc sql;
create table want as
select *,
(sum(R)-coalesce(R, 0))/(count(R)-1+missing(R)) as RE,
(sum(I)-coalesce(I, 0))/(count(I)-1+missing(I)) as IE
from have
group by CAT
order by ID, CAT;
quit;
The PGStats query deals with missing values in the following ways:
The SUM and COUNT aggregate functions operate on non-missing values only. When a R or I value is missing, the average excluding that specific observation is computed anyway from the other values in the same CAT group. When a R or I value is the only non-missing in its CAT group, the average excluding that specific observation is set to missing. When all R or I values are missing in a CAT group, the average is set to missing.
Upvotes: 0
Reputation: 1269463
You can readily do this by joining in the values and some arithmetic:
proc sql;
select t.*,
(sumr - r) / (cnt - 1) as re,
(sumi - i) / (cnt - 1) as ie
from t left join
(select cat, count(*) as cnt, sum(r) as sumr, sum(i) as sumi
from t
group by cat
having count(*) > 1
) tt
on t.cat = tt.cat;
Notice that I used a left join
and having
clause to prevent division by zero.
Upvotes: 1