How to create new column replacing part of the strings in existing column based on new vector of strings?

Question

I want to create a new column in my data frame based on the values of another column that contains a set of strings. For some of the strings, I want to change the string, others I want to keep as is.

To keep things short, I want to do this using a vector of strings that specifies which strings I want to change and a vector of strings that I want to change the matches into.

I usually do this using the package dplyr::mutate and the case_when function. For the following code, I want to change Paul and Barbara to Anna and Fred respectively, while keeping the other names.

library(dplyr)
library(tibble)

a<-rep(c("Paul", "Barbara","Joey","Iris"),3)
test<-enframe(a)

mutate(test,
  name2 = case_when(
   value == "Paul" ~ "Anna",
   value == "Barbara" ~ "Fred", 
   TRUE ~ value)
)

Given that the real dataset is much longer, I would like to use vectors of strings as specified earlier. Using %in% b works to find the matching cells but using vector d to replace the hits throws an error:

b<-c("Paul","Barbara") #only Paul and Barbara need to change
d<-c("Anna","Fred") #they need to change to Anna and Fred

mutate(test,
       name2 = case_when(
           value %in% b ~ d, 
           TRUE ~ value)

Error in mutate(): ! Problem while computing name2 = case_when(value %in% b ~ d, TRUE ~ value). Caused by error in case_when(): ! value %in% b ~ d must be length 12 or one, not 2. Run rlang::last_error() to see where the error occurred.

I was hoping that if the match would be with the second element of b, the second element of d would be used. Clearly, as value %in% b returns a vector of 12 TRUE/FALSE values, this does not work that way but is there any to work with vectors of strings like this?

thothal · Accepted Answer

I would do it like this:

lkp <- c("Anna","Fred") %>%
   setNames(c("Paul", "Barbara"))

test %>%
   mutate(name2 = coalesce(lkp[value], value))
# # A tibble: 12 × 3
#     name value   name2
#        
#  1     1 Paul    Anna 
#  2     2 Barbara Fred 
#  3     3 Joey    Joey 
#  4     4 Iris    Iris 
#  5     5 Paul    Anna 
#  6     6 Barbara Fred 
#  7     7 Joey    Joey 
#  8     8 Iris    Iris 
#  9     9 Paul    Anna 
# 10    10 Barbara Fred 
# 11    11 Joey    Joey 
# 12    12 Iris    Iris

Idea is that you create a named vector whose values are the new values and the names are the old values. Then you do a simple lookup and replace NAs (name not in the lookup vector) via coalesce with the original values.

How to create new column replacing part of the strings in existing column based on new vector of strings?

Answers (1)

Related Questions