Reputation: 2536
I want to use case_when()
in dplyr to make a new categorical column that says what a person's current status is on a training.
I have a tibble that looks like this:
library(dplyr)
problem <- tibble(name = c("Angela", "Claire", "Justin"),
status_1 = c("Registered", "No Action", "Completed"),
status_2 = c("Withdrawn", "No Action", "Registered"),
status_3 = c("No Action", "Registered", "Withdrawn"))
If the person has ever completed the course, their status should be completed (even if they accidentally register for the class again at a later date, as evidence by Justin in this example). Their status should be registered if they haven't completed the course and no later status undoes that, such as "no action" or "withdrawn". They should be status "Not Taken" if they have done nothing or withdrawn later than they registered.
In this example, the final dataset should look like this:
library(dplyr)
solution <- tibble(name = c("Angela", "Claire", "Justin"),
status_1 = c("Registered", "No Action", "Completed"),
status_2 = c("Withdrawn", "No Action", "Registered"),
status_3 = c("No Action", "Registered", "Withdrawn"),
current = c("Not Taken", "Registered", "Completed"))
Justin is completed because he completed the course at any point. Angela is Not Taken because she undid her registration. Claire is registered because her furthest status is registered.
This is what I have so far. It correctly categorizes Justin and Claire, but it miscategorizes Angela. I understand why it does miscategorize her, but I don't know how to do registration because it involves looking across later numbers, and R correctly views the variable names as a character.
library(dplyr)
library(purrr)
solution <- problem %>%
mutate(current_status = pmap_chr(select(., contains("status")), ~
case_when(any(str_detect(c(...), "(?i)Completed")) ~ "Completed",
any(str_detect(c(...), "(?i)Registered")) ~ "Registered",
any(str_detect(c(...), "(?i)No Action")) | any(str_detect(c(...), "(?i)Withdrawn")) ~ "Not Taken",
TRUE ~ "NA")))
Thank you!
Upvotes: 2
Views: 264
Reputation: 11150
Here's one way using apply
and case_when
. apply
goes through all the rows of problem
one-at-a-time and calculates result based on case_when
conditions.
problem %>%
mutate(
current =
apply(select(., starts_with("status")), 1, function(x) {
case_when(
"Completed" %in% x ~ "Completed",
which.max(x=="Registered") > which.max(x %in% c("No Action","Withdrawn")) ~ "Registered",
TRUE ~ "Not Taken"
)
})
)
# A tibble: 3 x 5
name status_1 status_2 status_3 current
<chr> <chr> <chr> <chr> <chr>
1 Angela Registered Withdrawn No Action Not Taken
2 Claire No Action No Action Registered Registered
3 Justin Completed Registered Withdrawn Completed
Outside of pipe, you can simply do -
problem$current <- select(problem, starts_with("status")) %>%
apply(., 1, function(x) {
case_when(
"Completed" %in% x ~ "Completed",
which.max(x == "Registered") > which.max(x %in% c("No Action", "Withdrawn")) ~ "Registered",
TRUE ~ "Not Taken"
)
})
Upvotes: 3