Reputation: 1422
Let's say I have a long character string: pneumonoultramicroscopicsilicovolcanoconiosis. I'd like to use stringr::str_replace_all
to replace certain letters with others. According to the documentation, str_replace_all
can take a named vector and replaces the name with the value. That works fine for 1 replacement, but for multiple it seems to do it iteratively, so the result is a replacement of the prelast iteration. I'm not sure this is the intended behaviour.
library(tidyverse)
text_string = "developer"
text_string %>%
str_replace_all(c(e ="X")) #this works fine
[1] "dXvXlopXr"
text_string %>%
str_replace_all(c(e ="p", p = "e")) #not intended behaviour
[1] "develoeer"
Desired result:
[1] "dpvploepr"
Which I get by introducing a new character:
text_string %>%
str_replace_all(c(e ="X", p = "e", X = "p"))
It's a usable workaround but hardly generalisable. Is this a bug or are my expectations wrong?
I'd like to also be able to replace n letters with n other letters simultaneously, preferably using either two vectors (like "old" and "new") or a named vector as input.
reprex edited for easier human reading
Upvotes: 8
Views: 3625
Reputation: 4537
2023 Update
Back when I first answered this I had a thrown together R package that was just on my github. Since then, I've refined it substantially and it's now on CRAN and even used in other packages.
The readme and CRAN documentation spells all this out, but I understand how helpful code is on this page. The updated usage is based on passing in vectors of patterns and replacements. There's a recycle option that will allow you to supply a replacement list that's shorter than the pattern list and just keep cycling through it. You can also pass arguments to regexpr in the backend (e.g. fixed=TRUE
)
install.packages('mgsub')
mgsub("developer",
pattern = c("e", "p"),
replacements = c("p", "e"))
#> [1] "dpvploepr"
Original Answer
I'm working on a package to deal with the type of problem. This is safer than the qdap::mgsub
function because it does not rely on placeholders. It fully supports regex as the matching and the replacement. You provide a named list where the names are the strings to match on and their value is the replacement.
devtools::install_github("bmewing/mgsub")
library(mgsub)
mgsub("developer",list("e" ="p", "p" = "e"))
#> [1] "dpvploepr"
qdap::mgsub(c("e","p"),c("p","e"),"developer")
#> [1] "dpvploppr"
Upvotes: 9
Reputation: 3954
The iterative behavior is intended. That said, we can use write our own workaround. I am going to use character subsetting for the replacement.
In a named vector, we can look up things by name and get a replacement value for each name. This is like doing all the replacement simultaneously.
rules <- c(a = "X", b = "Y", X = "a")
chars <- c("a", "a", "b", "X", "X")
rules[chars]
#> a a b X X
#> "X" "X" "Y" "a" "a"
So here, looking up "a"
in the rules
vector gets us "X"
, effectively replacing "a"
with "X"
. The same goes for the other characters.
One problem is that names without a match yield NA
.
rules <- c(a = "X", b = "Y", X = "a")
chars <- c("a", "Y", "Z")
rules[chars]
#> a <NA> <NA>
#> "X" NA NA
To prevent the NAs from appearing, we can expand the rules to include any new characters so that a character is replaced by itself.
rules <- c(a = "X", b = "Y", X = "a")
chars <- c("a", "Y", "Z")
no_rule <- chars[! chars %in% names(rules)]
rules2 <- c(rules, setNames(no_rule, no_rule))
rules2[chars]
#> a Y Z
#> "X" "Y" "Z"
And that's the logic behind the following function.
library(stringr)
str_replace_chars <- function(string, rules) {
# Expand rules to replace characters with themselves
# if those characters do not have a replacement rule
chars <- unique(unlist(strsplit(string, "")))
complete_rules <- setNames(chars, chars)
complete_rules[names(rules)] <- rules
# Split each string into characters, replace and unsplit
for (string_i in seq_along(string)) {
chars_i <- unlist(strsplit(string[string_i], ""))
string[string_i] <- paste0(complete_rules[chars_i], collapse = "")
}
string
}
rules <- c(a = "X", p = "e", e = "p")
string <- c("application", "developer")
str_replace_chars(string, rules)
#> [1] "XeelicXtion" "dpvploepr"
Upvotes: 1
Reputation: 643
My workaround would be to take advantage of the fact that str_replace_all can take functions as an input for the replacement.
library(stringr)
text_string = "developer"
pattern <- "p|e"
fun <- function(query) {
if(query == "e") y <- "p"
if(query == "p") y <- "e"
return(y)
}
str_replace_all(text_string, pattern, fun)
Of course, if you need to scale up, I would suggest to use a more sophisticated function.
Upvotes: 2
Reputation: 1076
There is probably an order in what the function does, so after replacing all c by s, you replace all s by c, only c remains .. try this :
long_string %>% str_replace_all(c(c ="X", s = "U")) %>% str_replace_all(c(X ="s", U = "c"))
Upvotes: 1