Reputation: 49
I have unique groups that consist of 2 or 3 people. Each person has a value for a variable X (or missing). For each group, i want to sum the values for all observations in that group AND assign the result to each person in the group.
the have dataset looks like this
group ID X
1 110 2.5
1 111 1.0
1 112 0.0
2 210 .
2 211 5.6
3 212 2.0
3 213 1.5
4 310 .
4 311 7.2
4 312 2.1
And the dataset I want with a new variable Y loos like this:
group ID X Y
1 110 2.5 3.5
1 111 1.0 3.5
1 112 0.0 3.5
2 210 . 5.6
2 211 5.6 5.6
3 212 2.0 3.5
3 213 1.5 3.5
4 310 . 9.3
4 311 7.2 9.3
4 312 2.1 9.3
Thanks for your assistance
Upvotes: 0
Views: 403
Reputation: 3315
if you data is already sorted Dow loop solution and mean followed by merge solution as suggested by @Richard in comments.
/*DOW LOOP solution*/
data want;
do until(last.group);
set have;
by group;
if first.group then Y=X;
else Y+X;
end;
do until(last.group);
set have;
by group;
output;
end;
run;
/* mean followed by merge solution*/
proc means data = have nway noprint;
class group;
var X;
output out=have1 ( keep = Group Y) sum=Y;
run;
data want;
merge have have1;
by group;
run;
Upvotes: 1
Reputation: 27498
@Kiran, your coding of serial do … until
with last.group
as the terminus condition does not work because the last.group=1
state persists despite the set;by;
within the second loop, and thus the second loop only iterates once.
A better solution, as long espoused by Dorfman, is the pattern of a first loop computing group size with 1 by 1
, followed by a second loop explicitly looping 1 to _n_
data have; input
group ID X; datalines;
1 110 2.5
1 111 1.0
1 112 0.0
2 210 .
2 211 5.6
3 212 2.0
3 213 1.5
4 310 .
4 311 7.2
4 312 2.1
run;
data want;
do _n_ = 1 by 1 until (last.group);
set have;
by group;
groupsum = sum(groupsum,x);
end;
do _n_ = 1 to _n_ ;
set have;
output; * group related computation result gets carried into the output data set (want);
end;
run;
The pattern part do _n_ = 1 to _n_
becomes more understandable when the coder realizes that SAS data step evaluates the start, stop and by values for the loop prior to actually iterating the loop.
Upvotes: 1
Reputation: 133
Assuming your table is called "ds":
PROC SQL;
SELECT group, ID, x, sum(x) AS y
FROM DS
GROUP BY group;
QUIT;
Upvotes: 2