Reputation: 3
Here is a very basic question, but I'm unable to find an easy way to do it. I have a dataset that references different highschools and students :
Highschool Students Sexe
A 1 m
A 2 m
A 3 m
A 4 f
A 5 f
B 1 m
B 2 m
And I'd like to create two new variables that count the number of male and female in each schools :
Highschool Students Sexe Nb_m Nb_f
A 1 m 1 0
A 2 m 2 0
A 3 m 3 0
A 4 f 3 1
A 5 f 3 2
B 1 m 1 0
B 2 m 2 0
And I can finally extract the last line with the total that would look like this :
Highschool Students Sexe Nb_m Nb_f
A 5 f 3 2
B 2 m 2 0
Any ideas ?
Upvotes: 0
Views: 2228
Reputation: 7769
You can do this in a single PROC SQL
step...
Also, I don't think you really need the value of Sexe
from the last row.
proc sql ; create table want as select Highschool, sum(case when Sexe = 'f' then 1 else 0 end) as Nb_f, sum(case when Sexe = 'm' then 1 else 0 end) as Nb_m, Nb_f + Nb_m as Students group by Highschool order by Highschool ; quit ;
Upvotes: 1
Reputation: 1107
First you have to sort your dataset by Highschool:
proc sort data = your_dataset;
by Highschool;
run;
then you use
data new_dataset;
set your_dataset;
by Highschool;
retain Nb_m Nb_f;
if Sexe = 'm' then
Nb_m + 1;
else
Nb_f + 1;
if last.Highschool then do;
Students = Nb_m + Nb_f;
output;
Nb_m = 0;
Nb_f = 0;
end;
run;
Upvotes: 0