pyll
pyll

Reputation: 1764

Conditional Imputation SAS

My data is a list of schools and their performances on certain subject assessments, along with the percentage of gender enrolled in the course. I've created a sample data set below:

data have;
    input school $ subject $ perc_male perc_female score similar_school $;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math . . 87 X
Y english . . 83 X
Y science . . 81 X
Y language . . 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;

As you can see, no gender percentages were collected for School Y. Research shows that school X has a very similar gender distribution, so I wish to impute the subject-specific percentages from X into Y. Another problem is that Y has a score for languages, while X did not take this assessment. In this case, I wish to take the mean of the imputed values (51, 48, 57) to get 52 for percentages of language course-takers that are male.

Executing this will demonstrate my desired output:

data want;
    input school $ subject $ perc_male perc_female score;
datalines;
X math 51 49 93 Y
X english 48 52 95 Y
X tech 60 40 90 Y
X science 57 43 92 Y
Y math 51 49 87 X
Y english 48 52 83 X
Y science 57 43 81 X
Y language 52 48 91 X
Z math 40 60 78 Z
Z english 50 50 76 Z
Z science 45 55 80 Z
;
run;

Got a downvote, so adding what I've tried to almost get me where I need to be. To whoever downvoted, I'd like to know if you have any constructive feedback. Thanks! I'm wondering if there is a way to build in the mean imputation part into my current snippet. Plus, I was thinking there may be a more efficient way to do this. Any help would be greatly appreciated.

proc sql;
    select distinct cats("'",similar_school,"'") into :school_list separated by ','
    from have
    where perc_male=.;
quit;

proc sql;
    create table stuff as
    select similar_school as school, subject, perc_male, perc_female
    from have
    where school in (&school_list.);
quit;

proc sql;
    create table want2 as
    select a.school, a.subject, coalesce(a.perc_male,b.perc_male), coalesce(a.perc_female,b.perc_female), a.score, a.similar_school
    from have as a
    left join stuff as b
        on a.school=b.school and a.subject=b.subject
    ;
quit;

Upvotes: 0

Views: 97

Answers (1)

sushil
sushil

Reputation: 1576

Based on you expected data, palin simple SQL can solve your problem. You can first do a self join based on school and similar school information and coalesce the perc_male & perc_female information. This will take care of your first issue.. For the 2nd part of the issue you can calculate the mean per school and coalesce perc_male & perc_female information with respective mean of school. Check out the below sql and let me know if it helps.

proc sql;
create table want as
select aa.school
     , aa.subject
     , coalesce(aa.perc_male, mean(aa.perc_male)) as perc_male
     , coalesce(aa.perc_female,mean(aa.perc_female)) as perc_female
     , score
     , similar_school
from (
        select a.school
             , a.subject
             , coalesce(a.perc_male ,b.perc_male) as perc_male
             , coalesce(a.perc_female,b.perc_female) as perc_female 
             , a.score 
             , a.similar_school
         from have as a
         left join have as b 
              on b.school=a.similar_school
             and a.subject=b.subject
      ) as aa
group by aa.school
;
quit;

Upvotes: 1

Related Questions