Brian Hill
Brian Hill

Reputation: 23

Create Missing Observations

I have longitudinal data set for University Students that houses various demographic and performance related information. I am interested in studying enrollement status, but the dataset does not contain information for "not enrolled" (ie. if a student left for a semester, then returned). I need a way to create an observation for this "not-enrolled" time period. The dataset is structured as followed:

  Student_ID  Enrollement_Status  Year  Semester  Variable1 Varible2    
    1                 1           2011      1         
    1                 2           2011      2
    1                 1           2012      2 
    2                 2           2011      1
    2                 2           2011      2 

I need an observation for Student_ID 1, with Enrollement_Status of 0 (1 and two indicatate part time/full time), for the Semester=1 in Year=2012. The year is based on academic year (not calender year), so it will be the same for both the Fall/Spring.I also need to copy the information of Variable1 - Variablen, as it does not change in time. It is possible that there is a gap longer than one semester, so in that case, I would need two (or more) observations for each semester the student is not enrolled.

Thanks! Brian

Upvotes: 0

Views: 87

Answers (2)

Jay Corbett
Jay Corbett

Reputation: 28441

Data STUDENTS;
    Input Student_ID Enrollment_Status Year Semester;
    Datalines;
1 1 2011 1
1 2 2011 2
1 1 2012 2
2 2 2011 1
2 2 2011 2
;
Run;

Learned about the SPARSE (PAPER) from @Joe's comment. It seems to do the trick of creating a data set with all possible semesters.

proc freq data=STUDENTS noprint;
tables student_id*year*semester/sparse out=all(drop=percent count);
run;

Then just join the two data sets to fill in the holes. Not as involved as @GordonLinoff's answer, so if I have missed something I am always interested to learn more about SQL

Proc Sql;
Create table WANT as
select a.student_id
, CASE
        When s.Enrollment_Status=. THEN 0
        ELSE s.Enrollment_Status
        END as Enrollment_Status
, a.year, a.semester
from  all a left join students s on a.student_id=s.student_id and a.year=s.year and a.semester=s.semester
;
Quit;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270873

You can do this using proc sql. The idea is to generate all the rows that you want -- that would be all students and all year/semester combinations. The use left join to bring in the existing information. Unmatched columns will be NULL:

select s.student_id, coalesce(ss.Enrollement_Status) as Enrollement_Status,
       ys.year, ys.semester, s.variable1, s.variable2
from (select distinct year, semester from students) ys cross join
     (select distinct student_id from students) s left join
     students ss
     on ss.year = ys.year and ss.semester = ys.semester and
        ss.student_id = s.student_id;

Upvotes: 1

Related Questions