J.Sabree
J.Sabree

Reputation: 2536

case_when with partial string match and contains()

I'm working with a dataset that has many columns called status1, status2, etc. Within those columns, it says if someone is exempt, complete, registered, etc.

Unfortunately, the exempt inputs are not consistent; here's a sample:

library(dplyr)

problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                  status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                  status2 = c("exempt", "Completed", "Completed", "Pending"),
                  status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))

I'm trying to use case_when() to make a new column that has their final status. If it ever says completed, then they are completed. If it ever says exempt without saying complete, then they are exempt.

The important part is that I want my code to use contains("status"), or some equivalent that only targets the status columns and doesn't require typing them all, and I want it to only require a partial string match for exempt.

As for using contains with case_when, I saw this example, but I wasn't able to apply it to my case: mutate with case_when and contains

This is what I've tried to use so far, but as you can guess, it has not worked:

library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
  mutate(final= case_when(pmap_chr(select(., contains("status")), ~
    any(c(...) == str_detect(., "Exempt") ~ "Exclude",
               TRUE ~ "Complete"
  ))))

Here's what I want the final product to look like:

solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                   status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                   status2 = c("exempt", "Completed", "Completed", "Pending"),
                   status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
                   final = c("Exclude", "Completed", "Completed", "Exclude")) 

Thank you!

Upvotes: 2

Views: 5047

Answers (1)

acylam
acylam

Reputation: 18681

I think you are doing it backwards. Put case_when inside pmap_chr instead of the other way around:

library(dplyr)
library(purrr)
library(stringr)

problem %>%
  mutate(final = pmap_chr(select(., contains("status")), 
                          ~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
                                      TRUE ~ "Completed")))

For each pmap iteration (each row of problem dataset), we want to use case_when to check if there exists the string Exempt. (?i) in str_detect makes it case insensitive. This is the same as writing str_detect(c(...), regex("Exempt", ignore_case = TRUE))

Output:

# A tibble: 4 x 5
  person status1   status2   status3     final    
  <chr>  <chr>     <chr>     <chr>       <chr>    
1 Corey  7EXEMPT   exempt    EXEMPTED    Exclude  
2 Sibley Completed Completed Completed   Completed
3 Justin Completed Completed Completed   Completed
4 Ruth   Pending   Pending   ExempT - 14 Exclude

Upvotes: 6

Related Questions