Reputation: 585
I have a table which contains the information of course selected in 2 semesters of each student. Those students didn't validate their first semester, so all the valid_or_not_of_semester='N'
for semester='1st'
:
student semester course_selected valid_or_not_of_semester
A 1st math N
A 1st english N
A 2nd math Y
A 2nd english Y
B 1st math N
B 2nd math Y
B 2nd english Y
C 1st math N
C 2nd math N
For those students who selected math
(or english
) in the first semester, I want to research if they have selected math
(or english
) in the second semester, if yes, I'm going to create a cross-tabulation, which counts the number of those students who validated or not their second semester:
--------------------------------------------------------------------------
1st semester \ 2nd semester | Math | English
invalid \ |---------------------|--------------------
students \ | valid | invalid | valid | invalid
--------------------------------------------------------------------------
Math | 2 | 1 | 2 | 0
--------------------------------------------------------------------------
English | 1 | 0 | 1 | 0
--------------------------------------------------------------------------
Each row stands for the number of students who didn't validate the first semester and have selected the course in the first semester. And columns seprate students who selected the course into valid and invalid of their second semester. To be more precise,
--------------------------------------------------------------------------
1st semester \ 2nd semester | Math | English
invalid \ |---------------------|--------------------
students \ | valid | invalid | valid | invalid
--------------------------------------------------------------------------
Math | 2 | 1 | 2 | 0
| | |
\ / \ / \ /
(students A&B) (student C) (students A&B)
I tried proc sql:
data math;
merge have
have (where=(semester='1st') in=these);
by student;
if these then output;
run;
proc sql;
create table result as
select count(distinct student) as nb_student
from math (where=(semester='2nd'))
group by course_selected, valid_or_not_of_semester;
quit;
And do the same thing for english
.
But is there any way to obtain the result of 2 courses directly? How could I use the proc freq?
Hope to get your answer.
Upvotes: 0
Views: 1147
Reputation: 1120
This doesn't give you exactly the table you were looking for, but it does generate the values you were interested in. The idea is to transpose the original dataset and then count up observations afterwards.
You may also want to look into proc tabulate though you may run into problems because you're double-counting students under certain circumstances.
data temp;
input student $ semester $ course_selected $ valid_or_not_of_semester $;
datalines;
A 1st math N
A 1st english N
A 2nd math Y
A 2nd english Y
B 1st math N
B 2nd math Y
B 2nd english Y
C 1st math N
C 2nd math N
;
proc sort; by student;
run;
proc transpose data = temp out = temp2;
by student;
id course_selected semester;
var valid_or_not_of_semester;
run;
proc sql;
create table temp3 as select distinct
sum(case when math1st = "N" and math2nd = "Y" then 1 else 0 end) as math_math_valid,
sum(case when math1st = "N" and math2nd = "N" then 1 else 0 end) as math_math_invalid,
sum(case when english1st = "N" and math2nd = "Y" then 1 else 0 end) as english_math_valid,
sum(case when english1st = "N" and math2nd = "N" then 1 else 0 end) as english_math_invalid,
sum(case when math1st = "N" and english2nd = "Y" then 1 else 0 end) as math_english_valid,
sum(case when math1st = "N" and english2nd = "N" then 1 else 0 end) as math_english_invalid,
sum(case when english1st = "N" and english2nd = "Y" then 1 else 0 end) as english_english_valid,
sum(case when english1st = "N" and english2nd = "N" then 1 else 0 end) as english_english_invalid
from temp2;
quit;
Upvotes: 1