Reputation: 13
I want to create a new column in my data frame based on the values of another column that contains a set of strings. For some of the strings, I want to change the string, others I want to keep as is.
To keep things short, I want to do this using a vector of strings that specifies which strings I want to change and a vector of strings that I want to change the matches into.
I usually do this using the package dplyr::mutate and the case_when function. For the following code, I want to change Paul and Barbara to Anna and Fred respectively, while keeping the other names.
library(dplyr)
library(tibble)
a<-rep(c("Paul", "Barbara","Joey","Iris"),3)
test<-enframe(a)
mutate(test,
name2 = case_when(
value == "Paul" ~ "Anna",
value == "Barbara" ~ "Fred",
TRUE ~ value)
)
Given that the real dataset is much longer, I would like to use vectors of strings as specified earlier. Using %in% b works to find the matching cells but using vector d to replace the hits throws an error:
b<-c("Paul","Barbara") #only Paul and Barbara need to change
d<-c("Anna","Fred") #they need to change to Anna and Fred
mutate(test,
name2 = case_when(
value %in% b ~ d,
TRUE ~ value)
Error in
mutate()
: ! Problem while computingname2 = case_when(value %in% b ~ d, TRUE ~ value)
. Caused by error incase_when()
: !value %in% b ~ d
must be length 12 or one, not 2. Runrlang::last_error()
to see where the error occurred.
I was hoping that if the match would be with the second element of b, the second element of d would be used. Clearly, as value %in% b returns a vector of 12 TRUE/FALSE values, this does not work that way but is there any to work with vectors of strings like this?
Upvotes: 1
Views: 53
Reputation: 20409
I would do it like this:
lkp <- c("Anna","Fred") %>%
setNames(c("Paul", "Barbara"))
test %>%
mutate(name2 = coalesce(lkp[value], value))
# # A tibble: 12 × 3
# name value name2
# <int> <chr> <chr>
# 1 1 Paul Anna
# 2 2 Barbara Fred
# 3 3 Joey Joey
# 4 4 Iris Iris
# 5 5 Paul Anna
# 6 6 Barbara Fred
# 7 7 Joey Joey
# 8 8 Iris Iris
# 9 9 Paul Anna
# 10 10 Barbara Fred
# 11 11 Joey Joey
# 12 12 Iris Iris
Idea is that you create a named vector whose values are the new values and the names are the old values. Then you do a simple lookup and replace NAs
(name not in the lookup vector) via coalesce
with the original values.
Upvotes: 1