user1626092
user1626092

Reputation: 499

sas count number of different combinations

I have a data set with information on students' educations on a institution. I want to get a number of how many different combinations of study programmes they have been on. I have information on both master and bachelor level and I want to count the number of different study programmes in each education level (master, bachelor).

For example person1 can have:

Bachelor:
- study1
- study2 
- study3
- study3

Master:
- studyA
- studyA 

Then I want a number of 3 study programmes in bachelor level (study3 should not Count twice), and a number of 1 in masters level. Each study programme has its own row - so in the dataset person1 has 6 rows. I want one row per person telling the number of study programmes per education level:

person   number_bachelor     number_master
person1  3                   1
....etc...

I have tried with this:

proc sql;
create table  new as
select  distinct personid, name, 
count(study) as number_of_bach
from old
group by personid, edu_level, study;
quit;

But it doesn't give me what I want. This gives me two rows with person1 with the values of 1 and 2 in the variable "number_of_bach".

How can I edit this code to get the result I want?

Upvotes: 0

Views: 1067

Answers (2)

Harshad Patil
Harshad Patil

Reputation: 313

Code:

data education;
input person $ level $ program $;
datalines;
person1 bachelor study1
person1 bachelor study2
person1 bachelor study3
person1 bachelor study3
person1 master study1
person2 bachelor study1
person2 master study2
person2 master study1 
;
run;

proc sort data = education nodupkey;
 by person level program;
run;

proc sql;
 select person, 
 sum(case when level eq 'bachelor' then 1 else 0 end) as num_bachelors,
 sum(case when level eq 'bachelor' then 1 else 0 end) as num_masters
 from education
 group by person;
quit;

Working: Here, SORT procedure will eliminate duplicate records, if any. Then SQL procedure only can be used to generate the person wise count of programs at bachelor level as well as count of programs at master level.

Output:

person   num_bachelors    num_masters
person1              3              1
person2              1              2

Upvotes: 2

cstotzer
cstotzer

Reputation: 415

Is this what you want?

DATA old; 
   INPUT personid edu_level $ study $;
   DATALINES; 
1 bachelor study1
1 bachelor study2
1 bachelor study3
1 bachelor study3
1 master studyA
1 master studyA
1 master studyB
; 

PROC SQL;
  CREATE TABLE new AS
  SELECT personid, edu_level, COUNT (DISTINCT study) AS num_bach
    FROM OLD
   GROUP BY personid, edu_level;
QUIT;

The column study is a so-called an aggregate column in your query (because COUNT is an aggregate function) and as such should not be included in the GROUP BY-clause (else your query will also groupy by 'study' and the count will always be 1.

If you want to have one each person on one line then add a PROC TRANSPOSE:

PROC transpose IN = new OUT = new2;
  BY personid;
  ID edu_level;
RUN;

(You could also create a more complex query using subqueries and joins instead of the transpose, as long as you don't have millions of rows the overhead for the TRANSPOSE doesn't matter)

For the sake of completeness here is a SQL-only solution to your question:

PROC SQL;
  CREATE TABLE new AS
  SELECT p.personid, b.num_bachelors, m.num_masters
            /* Select unique personids */
            FROM (SELECT DISTINCT personid 
                    FROM old) AS p
            /* Count number of bachelor-level courses */
            LEFT JOIN (SELECT personid, 
                              COUNT(DISTINCT study) AS num_bachelors 
                         FROM old WHERE edu_level = 'bachelor' 
                        GROUP BY personid) AS b on p.personid = b.personid
            /* Count number of master-level courses */
            LEFT JOIN (SELECT personid, 
                              COUNT(DISTINCT study) AS num_masters 
                         FROM old WHERE edu_level = 'master' 
                        GROUP BY personid) AS m on p.personid = m.personid;

QUIT;

Upvotes: 0

Related Questions