J.Sabree
J.Sabree

Reputation: 2546

Use case_when across columns to make a new column

I have a large dataset that has many columns with status. I want to make a new column that has the participants' current status. I'm trying to use case_when in dplyr, but I'm unsure how to go across columns. The dataset has too many columns for me to type out each column. Here's a sample of the data:

library(dplyr)
problem <- tibble(name = c("sally", "jane", "austin", "mike"),
                  status1 = c("registered", "completed", "registered", "no action"),
                  status2 = c("completed", "completed", "registered", "no action"),
                  status3 = c("completed", "completed", "withdrawn", "no action"),
                  status4 = c("withdrawn", "completed", "no action", "registered"))

For the code, I want a new column that says the participants' final status; HOWEVER, if their status ever was completed, then I want it to say completed, regardless of what their final status is. For this data, the answer would look like this:


answer <- tibble(name = c("sally", "jane", "austin", "mike"),
                 status1 = c("registered", "completed", "registered", "no action"),
                 status2 = c("completed", "completed", "registered", "no action"),
                 status3 = c("completed", "completed", "withdrawn", "no action"),
                 status4 = c("withdrawn", "completed", "no action", "registered"),
                 finalstatus = c("completed", "completed", "no action", "registered"))

Also, if you can include any explanation of your code, I would really appreciate it! If your solution could also use the contains("status"), that would be especially helpful because in my real data set, the status columns are very messy (i.e. summary_status_5292019, sum_status_07012018, etc).

Thank you!

Upvotes: 4

Views: 1865

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28705

Here's a function to do this kind of "row matching" operation. Similar to case_when, you can put the checks vector in a specific order so that when a match is found for one element e.g. 'completed' in the data, matches on later elements aren't considered.

row_match <- function(data, checks, labels){
  matches <- match(unlist(data), checks)
  dim(matches) <- dim(data)
  labels[apply(matches, 1, min, na.rm = T)]
}

df %>% 
  mutate(final.stat = row_match(
                        data = select(df, starts_with('status')),
                        checks = c('completed', 'withdrawn', 'registered'),
                        labels = c('completed', 'no action', 'registered')))

# # A tibble: 4 x 6
#   name   status1    status2    status3   status4    final.stat
#   <chr>  <chr>      <chr>      <chr>     <chr>      <chr>     
# 1 sally  registered completed  completed withdrawn  completed 
# 2 jane   completed  completed  completed completed  completed 
# 3 austin registered registered withdrawn no action  no action 
# 4 mike   no action  no action  no action registered registered

Upvotes: 3

akrun
akrun

Reputation: 887951

An option with pmap

library(tidyverse)
problem %>%
     mutate(finalstatus =  pmap_chr(select(., starts_with('status')), ~ 
       case_when(any(c(...) == "completed")~ "completed",
             any(c(...) == "withdrawn") ~ "no action", 
     TRUE ~ "registered")))

Upvotes: 8

Related Questions