Reputation: 1764
My data is a list of schools and their performances on certain subject assessments, along with the percentage of gender enrolled in the course. I've created a sample data set below:
data have;
input school $ subject $ perc_male perc_female score similar_school $;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math . . 87 X
Y english . . 83 X
Y science . . 81 X
Y language . . 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;
As you can see, no gender percentages were collected for School Y. Research shows that school X has a very similar gender distribution, so I wish to impute the subject-specific percentages from X into Y. Another problem is that Y has a score for languages, while X did not take this assessment. In this case, I wish to take the mean of the imputed values (51, 48, 57) to get 52 for percentages of language course-takers that are male.
Executing this will demonstrate my desired output:
data want;
input school $ subject $ perc_male perc_female score;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math 51 49 87 X
Y english 48 52 83 X
Y science 57 43 81 X
Y language 52 48 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;
Got a downvote, so adding what I've tried to almost get me where I need to be. To whoever downvoted, I'd like to know if you have any constructive feedback. Thanks! I'm wondering if there is a way to build in the mean imputation part into my current snippet. Plus, I was thinking there may be a more efficient way to do this. Any help would be greatly appreciated.
proc sql;
select distinct cats("'",similar_school,"'") into :school_list separated by ','
from have
where perc_male=.;
quit;
proc sql;
create table stuff as
select similar_school as school, subject, perc_male, perc_female
from have
where school in (&school_list.);
quit;
proc sql;
create table want2 as
select a.school, a.subject, coalesce(a.perc_male,b.perc_male), coalesce(a.perc_female,b.perc_female), a.score, a.similar_school
from have as a
left join stuff as b
on a.school=b.school and a.subject=b.subject
;
quit;
Upvotes: 0
Views: 97
Reputation: 1576
Based on you expected data, palin simple SQL can solve your problem. You can first do a self join based on school and similar school information and coalesce the perc_male & perc_female information. This will take care of your first issue.. For the 2nd part of the issue you can calculate the mean per school and coalesce perc_male & perc_female information with respective mean of school. Check out the below sql and let me know if it helps.
proc sql;
create table want as
select aa.school
, aa.subject
, coalesce(aa.perc_male, mean(aa.perc_male)) as perc_male
, coalesce(aa.perc_female,mean(aa.perc_female)) as perc_female
, score
, similar_school
from (
select a.school
, a.subject
, coalesce(a.perc_male ,b.perc_male) as perc_male
, coalesce(a.perc_female,b.perc_female) as perc_female
, a.score
, a.similar_school
from have as a
left join have as b
on b.school=a.similar_school
and a.subject=b.subject
) as aa
group by aa.school
;
quit;
Upvotes: 1