Bram Van den Bergh
Bram Van den Bergh

Reputation: 13

How to create new column replacing part of the strings in existing column based on new vector of strings?

I want to create a new column in my data frame based on the values of another column that contains a set of strings. For some of the strings, I want to change the string, others I want to keep as is.

To keep things short, I want to do this using a vector of strings that specifies which strings I want to change and a vector of strings that I want to change the matches into.

I usually do this using the package dplyr::mutate and the case_when function. For the following code, I want to change Paul and Barbara to Anna and Fred respectively, while keeping the other names.

library(dplyr)
library(tibble)

a<-rep(c("Paul", "Barbara","Joey","Iris"),3)
test<-enframe(a)

mutate(test,
  name2 = case_when(
   value == "Paul" ~ "Anna",
   value == "Barbara" ~ "Fred", 
   TRUE ~ value)
)

Given that the real dataset is much longer, I would like to use vectors of strings as specified earlier. Using %in% b works to find the matching cells but using vector d to replace the hits throws an error:

b<-c("Paul","Barbara") #only Paul and Barbara need to change
d<-c("Anna","Fred") #they need to change to Anna and Fred

mutate(test,
       name2 = case_when(
           value %in% b ~ d, 
           TRUE ~ value)

Error in mutate(): ! Problem while computing name2 = case_when(value %in% b ~ d, TRUE ~ value). Caused by error in case_when(): ! value %in% b ~ d must be length 12 or one, not 2. Run rlang::last_error() to see where the error occurred.

I was hoping that if the match would be with the second element of b, the second element of d would be used. Clearly, as value %in% b returns a vector of 12 TRUE/FALSE values, this does not work that way but is there any to work with vectors of strings like this?

Upvotes: 1

Views: 53

Answers (1)

thothal
thothal

Reputation: 20409

I would do it like this:

lkp <- c("Anna","Fred") %>%
   setNames(c("Paul", "Barbara"))

test %>%
   mutate(name2 = coalesce(lkp[value], value))
# # A tibble: 12 × 3
#     name value   name2
#    <int> <chr>   <chr>
#  1     1 Paul    Anna 
#  2     2 Barbara Fred 
#  3     3 Joey    Joey 
#  4     4 Iris    Iris 
#  5     5 Paul    Anna 
#  6     6 Barbara Fred 
#  7     7 Joey    Joey 
#  8     8 Iris    Iris 
#  9     9 Paul    Anna 
# 10    10 Barbara Fred 
# 11    11 Joey    Joey 
# 12    12 Iris    Iris 

Idea is that you create a named vector whose values are the new values and the names are the old values. Then you do a simple lookup and replace NAs (name not in the lookup vector) via coalesce with the original values.

Upvotes: 1

Related Questions