Reputation: 7
A beginner here. I have data on students and courses in which they are either enrolled in, have dropped/withdrew, or never enrolled. Hence I have multiple rows of the same student as they are associated with multiple courses in a given term. My desired output is to define each student (and not each course) as enrolled, dropped/withdrew, or never enrolled to a single row in order to count the number of unique students under each category.
If a student is enrolled in at least one course then they would be defined as an enrolled ("E") student regardless the status of the student's other courses. If a student has dropped/withdrew all their courses then they would be defined as a dropped/withdrawn ("DW") student. If the enrollment status of the course for the student is "NA" this means the student applied but never enrolled in any courses and would be defined as a never enrolled ("NE") student.
Would appreciate the appropriate code to do this.
Current output:
STUDENT_ID COURSE COURSE_ENROLLMENT_STATUS
12345 MATH123 ENROLLED
12345 MATH345 DROPPED
45332 NA NA
53930 ENGL302 DROPPED
53930 MATH312 WITHDREW
53930 ENGN102 DROPPED
Desired output:
STUDENT_ID STUDENT_TYPE
12345 E
45332 NE
53930 DW
Upvotes: 0
Views: 719
Reputation: 118
library(dplyr)
df %>%
group_by(STUDENT_ID) %>%
mutate(STUDENT_TYPE = case_when(
any(COURSE_ENROLLMENT_STATUS == "ENROLLED") ~ "E",
all(COURSE_ENROLLMENT_STATUS == "DROPPED" | COURSE_ENROLLMENT_STATUS == "WITHDREW") ~ "DW",
TRUE ~ "NE")) %>%
select(STUDENT_ID, STUDENT_TYPE) %>% distinct()
Upvotes: 1