J.Sabree
J.Sabree

Reputation: 2536

use case_when in R looking at the number at the end of a column name is greater than the number at the end of another column name

I want to use case_when() in dplyr to make a new categorical column that says what a person's current status is on a training.

I have a tibble that looks like this:

library(dplyr)
problem <- tibble(name = c("Angela", "Claire", "Justin"),
                  status_1 = c("Registered", "No Action", "Completed"),
                  status_2 = c("Withdrawn", "No Action", "Registered"),
                  status_3 = c("No Action", "Registered", "Withdrawn"))

If the person has ever completed the course, their status should be completed (even if they accidentally register for the class again at a later date, as evidence by Justin in this example). Their status should be registered if they haven't completed the course and no later status undoes that, such as "no action" or "withdrawn". They should be status "Not Taken" if they have done nothing or withdrawn later than they registered.

In this example, the final dataset should look like this:

library(dplyr)
solution <- tibble(name = c("Angela", "Claire", "Justin"),
                   status_1 = c("Registered", "No Action", "Completed"),
                   status_2 = c("Withdrawn", "No Action", "Registered"),
                   status_3 = c("No Action", "Registered", "Withdrawn"),
                   current = c("Not Taken", "Registered", "Completed"))

Justin is completed because he completed the course at any point. Angela is Not Taken because she undid her registration. Claire is registered because her furthest status is registered.

This is what I have so far. It correctly categorizes Justin and Claire, but it miscategorizes Angela. I understand why it does miscategorize her, but I don't know how to do registration because it involves looking across later numbers, and R correctly views the variable names as a character.

library(dplyr)
library(purrr)
solution <- problem %>%
  mutate(current_status = pmap_chr(select(., contains("status")), ~
                                     case_when(any(str_detect(c(...), "(?i)Completed")) ~ "Completed",
                                               any(str_detect(c(...), "(?i)Registered")) ~ "Registered", 
                                               any(str_detect(c(...), "(?i)No Action")) | any(str_detect(c(...), "(?i)Withdrawn")) ~ "Not Taken",
                                               TRUE ~ "NA"))) 

Thank you!

Upvotes: 2

Views: 264

Answers (1)

Shree
Shree

Reputation: 11150

Here's one way using apply and case_when. apply goes through all the rows of problem one-at-a-time and calculates result based on case_when conditions.

problem %>% 
 mutate(
   current = 
     apply(select(., starts_with("status")), 1, function(x) {
       case_when(
         "Completed" %in% x ~ "Completed",
         which.max(x=="Registered") > which.max(x %in% c("No Action","Withdrawn")) ~ "Registered",
         TRUE ~ "Not Taken"
       )
     })
  )

# A tibble: 3 x 5
  name   status_1   status_2   status_3   current   
  <chr>  <chr>      <chr>      <chr>      <chr>     
1 Angela Registered Withdrawn  No Action  Not Taken 
2 Claire No Action  No Action  Registered Registered
3 Justin Completed  Registered Withdrawn  Completed 

Outside of pipe, you can simply do -

problem$current <- select(problem, starts_with("status")) %>% 
  apply(., 1, function(x) {
    case_when(
      "Completed" %in% x ~ "Completed",
      which.max(x == "Registered") > which.max(x %in% c("No Action", "Withdrawn")) ~ "Registered",
      TRUE ~ "Not Taken"
    )
  })

Upvotes: 3

Related Questions