velvetrock
velvetrock

Reputation: 585

SAS proc freq for distinct value of a variable

I have a table which contains the information of course selected in 2 semesters of each student. Those students didn't validate their first semester, so all the valid_or_not_of_semester='N' for semester='1st':

student    semester    course_selected    valid_or_not_of_semester
  A          1st         math               N
  A          1st         english            N
  A          2nd         math               Y
  A          2nd         english            Y
  B          1st         math               N
  B          2nd         math               Y
  B          2nd         english            Y
  C          1st         math               N
  C          2nd         math               N

For those students who selected math (or english) in the first semester, I want to research if they have selected math (or english) in the second semester, if yes, I'm going to create a cross-tabulation, which counts the number of those students who validated or not their second semester:

--------------------------------------------------------------------------
   1st semester \ 2nd semester |           Math      |  English
     invalid     \             |---------------------|--------------------
     students     \            |  valid |  invalid   | valid |  invalid  
--------------------------------------------------------------------------
          Math                 |   2    |    1       |  2    |     0
--------------------------------------------------------------------------
         English               |   1    |    0       |  1    |     0
--------------------------------------------------------------------------

Each row stands for the number of students who didn't validate the first semester and have selected the course in the first semester. And columns seprate students who selected the course into valid and invalid of their second semester. To be more precise,

--------------------------------------------------------------------------
   1st semester \ 2nd semester |           Math      |  English
     invalid     \             |---------------------|--------------------
     students     \            |  valid |  invalid   | valid |  invalid   
--------------------------------------------------------------------------
          Math                 |   2   |       1     |    2  |     0
                                   |           |          |
                                  \ /         \ /        \ /
                         (students A&B)   (student C)  (students A&B)

I tried proc sql:

data math;
   merge have
     have (where=(semester='1st') in=these);
   by student;
   if these then output;
run;

proc sql;
    create table result as
    select count(distinct student) as nb_student
    from math (where=(semester='2nd'))
    group by course_selected, valid_or_not_of_semester;
quit;

And do the same thing for english.

But is there any way to obtain the result of 2 courses directly? How could I use the proc freq?

Hope to get your answer.

Upvotes: 0

Views: 1147

Answers (1)

Sean
Sean

Reputation: 1120

This doesn't give you exactly the table you were looking for, but it does generate the values you were interested in. The idea is to transpose the original dataset and then count up observations afterwards.

You may also want to look into proc tabulate though you may run into problems because you're double-counting students under certain circumstances.

data temp;
   input student $ semester $ course_selected $ valid_or_not_of_semester $;
   datalines;
     A 1st math N
     A 1st english N
     A 2nd math Y
     A 2nd english Y
     B 1st math N
     B 2nd math Y
     B 2nd english Y
     C 1st math N
     C 2nd math N 
    ;
    proc sort; by student;
run;

proc transpose data = temp out = temp2;
    by student;
    id course_selected semester;
    var valid_or_not_of_semester;
run;

proc sql;
    create table temp3 as select distinct
        sum(case when math1st = "N" and math2nd = "Y" then 1 else 0 end) as math_math_valid,
        sum(case when math1st = "N" and math2nd = "N" then 1 else 0 end) as math_math_invalid,
        sum(case when english1st = "N" and math2nd = "Y" then 1 else 0 end) as english_math_valid,
        sum(case when english1st = "N" and math2nd = "N" then 1 else 0 end) as english_math_invalid,
        sum(case when math1st = "N" and english2nd = "Y" then 1 else 0 end) as math_english_valid,
        sum(case when math1st = "N" and english2nd = "N" then 1 else 0 end) as math_english_invalid,
        sum(case when english1st = "N" and english2nd = "Y" then 1 else 0 end) as english_english_valid,
        sum(case when english1st = "N" and english2nd = "N" then 1 else 0 end) as english_english_invalid
        from temp2;
quit;

Upvotes: 1

Related Questions