sas count number of different combinations

Question

I have a data set with information on students' educations on a institution. I want to get a number of how many different combinations of study programmes they have been on. I have information on both master and bachelor level and I want to count the number of different study programmes in each education level (master, bachelor).

For example person1 can have:

Bachelor:
- study1
- study2 
- study3
- study3

Master:
- studyA
- studyA

Then I want a number of 3 study programmes in bachelor level (study3 should not Count twice), and a number of 1 in masters level. Each study programme has its own row - so in the dataset person1 has 6 rows. I want one row per person telling the number of study programmes per education level:

person   number_bachelor     number_master
person1  3                   1
....etc...

I have tried with this:

proc sql;
create table  new as
select  distinct personid, name, 
count(study) as number_of_bach
from old
group by personid, edu_level, study;
quit;

But it doesn't give me what I want. This gives me two rows with person1 with the values of 1 and 2 in the variable "number_of_bach".

How can I edit this code to get the result I want?

cstotzer · Accepted Answer

Is this what you want?

DATA old; 
   INPUT personid edu_level $ study $;
   DATALINES; 
1 bachelor study1
1 bachelor study2
1 bachelor study3
1 bachelor study3
1 master studyA
1 master studyA
1 master studyB
; 

PROC SQL;
  CREATE TABLE new AS
  SELECT personid, edu_level, COUNT (DISTINCT study) AS num_bach
    FROM OLD
   GROUP BY personid, edu_level;
QUIT;

The column study is a so-called an aggregate column in your query (because COUNT is an aggregate function) and as such should not be included in the GROUP BY-clause (else your query will also groupy by 'study' and the count will always be 1.

If you want to have one each person on one line then add a PROC TRANSPOSE:

PROC transpose IN = new OUT = new2;
  BY personid;
  ID edu_level;
RUN;

(You could also create a more complex query using subqueries and joins instead of the transpose, as long as you don't have millions of rows the overhead for the TRANSPOSE doesn't matter)

For the sake of completeness here is a SQL-only solution to your question:

PROC SQL;
  CREATE TABLE new AS
  SELECT p.personid, b.num_bachelors, m.num_masters
            /* Select unique personids */
            FROM (SELECT DISTINCT personid 
                    FROM old) AS p
            /* Count number of bachelor-level courses */
            LEFT JOIN (SELECT personid, 
                              COUNT(DISTINCT study) AS num_bachelors 
                         FROM old WHERE edu_level = 'bachelor' 
                        GROUP BY personid) AS b on p.personid = b.personid
            /* Count number of master-level courses */
            LEFT JOIN (SELECT personid, 
                              COUNT(DISTINCT study) AS num_masters 
                         FROM old WHERE edu_level = 'master' 
                        GROUP BY personid) AS m on p.personid = m.personid;

QUIT;

sas count number of different combinations

Answers (2)

Related Questions