Reputation: 23
I have longitudinal data set for University Students that houses various demographic and performance related information. I am interested in studying enrollement status, but the dataset does not contain information for "not enrolled" (ie. if a student left for a semester, then returned). I need a way to create an observation for this "not-enrolled" time period. The dataset is structured as followed:
Student_ID Enrollement_Status Year Semester Variable1 Varible2
1 1 2011 1
1 2 2011 2
1 1 2012 2
2 2 2011 1
2 2 2011 2
I need an observation for Student_ID 1, with Enrollement_Status of 0 (1 and two indicatate part time/full time), for the Semester=1 in Year=2012. The year is based on academic year (not calender year), so it will be the same for both the Fall/Spring.I also need to copy the information of Variable1 - Variablen, as it does not change in time. It is possible that there is a gap longer than one semester, so in that case, I would need two (or more) observations for each semester the student is not enrolled.
Thanks! Brian
Upvotes: 0
Views: 87
Reputation: 28441
Data STUDENTS;
Input Student_ID Enrollment_Status Year Semester;
Datalines;
1 1 2011 1
1 2 2011 2
1 1 2012 2
2 2 2011 1
2 2 2011 2
;
Run;
Learned about the SPARSE (PAPER) from @Joe's comment. It seems to do the trick of creating a data set with all possible semesters.
proc freq data=STUDENTS noprint;
tables student_id*year*semester/sparse out=all(drop=percent count);
run;
Then just join the two data sets to fill in the holes. Not as involved as @GordonLinoff's answer, so if I have missed something I am always interested to learn more about SQL
Proc Sql;
Create table WANT as
select a.student_id
, CASE
When s.Enrollment_Status=. THEN 0
ELSE s.Enrollment_Status
END as Enrollment_Status
, a.year, a.semester
from all a left join students s on a.student_id=s.student_id and a.year=s.year and a.semester=s.semester
;
Quit;
Upvotes: 0
Reputation: 1270873
You can do this using proc sql
. The idea is to generate all the rows that you want -- that would be all students and all year/semester combinations. The use left join
to bring in the existing information. Unmatched columns will be NULL
:
select s.student_id, coalesce(ss.Enrollement_Status) as Enrollement_Status,
ys.year, ys.semester, s.variable1, s.variable2
from (select distinct year, semester from students) ys cross join
(select distinct student_id from students) s left join
students ss
on ss.year = ys.year and ss.semester = ys.semester and
ss.student_id = s.student_id;
Upvotes: 1