Justin
Justin

Reputation: 47

SAS: Compute mean by grouping variable excluding observation

Thanks in advance for any and all help here.

An example data set and desired output is linked below.

I want to compute new variables from the "Have" data set as follows: RE: the average of "R" values within a given "Cat" variable value excluding the specific observation IE the average of "I" responses within a given "Cat" variable value excluding the specific observation

enter image description here

Upvotes: 1

Views: 608

Answers (2)

Justin
Justin

Reputation: 47

Thanks both of you, the left join works, but may present problems with missing data. As such, this appears to be the best solution (thanks to PGStats on SAS Communities):

proc sql;
create table want as
select *, 
    (sum(R)-coalesce(R, 0))/(count(R)-1+missing(R)) as RE,
    (sum(I)-coalesce(I, 0))/(count(I)-1+missing(I)) as IE
from have
group by CAT
order by ID, CAT;
quit;

The PGStats query deals with missing values in the following ways:

The SUM and COUNT aggregate functions operate on non-missing values only. When a R or I value is missing, the average excluding that specific observation is computed anyway from the other values in the same CAT group. When a R or I value is the only non-missing in its CAT group, the average excluding that specific observation is set to missing. When all R or I values are missing in a CAT group, the average is set to missing.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269463

You can readily do this by joining in the values and some arithmetic:

proc sql;
    select t.*,
           (sumr - r) / (cnt - 1) as re,
           (sumi - i) / (cnt - 1) as ie
    from t left join
         (select cat, count(*) as cnt, sum(r) as sumr, sum(i) as sumi
          from t
          group by cat
          having count(*) > 1
         ) tt
         on t.cat = tt.cat;

Notice that I used a left join and having clause to prevent division by zero.

Upvotes: 1

Related Questions