Reputation: 2546
I have a large dataset that has many columns with status. I want to make a new column that has the participants' current status. I'm trying to use case_when in dplyr, but I'm unsure how to go across columns. The dataset has too many columns for me to type out each column. Here's a sample of the data:
library(dplyr)
problem <- tibble(name = c("sally", "jane", "austin", "mike"),
status1 = c("registered", "completed", "registered", "no action"),
status2 = c("completed", "completed", "registered", "no action"),
status3 = c("completed", "completed", "withdrawn", "no action"),
status4 = c("withdrawn", "completed", "no action", "registered"))
For the code, I want a new column that says the participants' final status; HOWEVER, if their status ever was completed, then I want it to say completed, regardless of what their final status is. For this data, the answer would look like this:
answer <- tibble(name = c("sally", "jane", "austin", "mike"),
status1 = c("registered", "completed", "registered", "no action"),
status2 = c("completed", "completed", "registered", "no action"),
status3 = c("completed", "completed", "withdrawn", "no action"),
status4 = c("withdrawn", "completed", "no action", "registered"),
finalstatus = c("completed", "completed", "no action", "registered"))
Also, if you can include any explanation of your code, I would really appreciate it! If your solution could also use the contains("status"), that would be especially helpful because in my real data set, the status columns are very messy (i.e. summary_status_5292019, sum_status_07012018, etc).
Thank you!
Upvotes: 4
Views: 1865
Reputation: 28705
Here's a function to do this kind of "row matching" operation. Similar to case_when, you can put the checks
vector in a specific order so that when a match is found for one element e.g. 'completed'
in the data, matches on later elements aren't considered.
row_match <- function(data, checks, labels){
matches <- match(unlist(data), checks)
dim(matches) <- dim(data)
labels[apply(matches, 1, min, na.rm = T)]
}
df %>%
mutate(final.stat = row_match(
data = select(df, starts_with('status')),
checks = c('completed', 'withdrawn', 'registered'),
labels = c('completed', 'no action', 'registered')))
# # A tibble: 4 x 6
# name status1 status2 status3 status4 final.stat
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 sally registered completed completed withdrawn completed
# 2 jane completed completed completed completed completed
# 3 austin registered registered withdrawn no action no action
# 4 mike no action no action no action registered registered
Upvotes: 3
Reputation: 887951
An option with pmap
library(tidyverse)
problem %>%
mutate(finalstatus = pmap_chr(select(., starts_with('status')), ~
case_when(any(c(...) == "completed")~ "completed",
any(c(...) == "withdrawn") ~ "no action",
TRUE ~ "registered")))
Upvotes: 8