Reputation: 397
I have columns in a dataframe where I want to replace integers with their corresponding string values. The integers are often repeating in cells (separated by spaces, commas, /, or - etc.). For example my dataframe column is:
> df = data.frame(c1=c(1,2,3,23,c('11,21'),c('13-23')))
> df
c1
1 1
2 2
3 3
4 23
5 11,21
6 13-23
I have used both str_replace_all()
and str_replace()
methods but did not get the desired results.
> df[,1] %>% str_replace_all(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))
[1] "a" "b" "c" "bc" "aa,ba" "ac-bc"
> df[,1] %>% str_replace(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))
Error in fix_replacement(replacement) : argument "replacement" is missing, with no default
The desired result would be:
[1] "a" "b" "c" "g" "d,f" "e-g"
As there are multiple values to replace that's why my first choice was str_replace_all()
as it allows to have a vector with the original column values and desired replacement values but the method fails due to regex. Am I doing it wrong or is there any better alternative to solve my problem?
Upvotes: 1
Views: 152
Reputation: 28705
Using the ordering method in @GKi's answer, here's a base R version using Reduce
/gsub
instead of stringr::str_replace_all
Starting vector
x <- as.character(df$c1)
Ordering as in @GKi answer
repl_dict <- c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c")
repl_dict <- repl_dict[order(nchar(names(repl_dict)), decreasing = TRUE)]
Replacement
Reduce(
function(x, n) gsub(n, repl_dict[n], x, fixed = TRUE),
names(repl_dict),
init = x)
# [1] "a" "b" "c" "g" "d,f" "e-g"
Upvotes: 1
Reputation: 39747
Simply place the longest multi-character at the beginning like:
library(stringr)
str_replace_all(df[,1],
c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c"))
#[1] "a" "b" "c" "g" "d,f" "e-g"
and for complexer cases:
x <- c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g")
x <- x[order(nchar(names(x)), decreasing = TRUE)]
str_replace_all(df[,1], x)
#[1] "a" "b" "c" "g" "d,f" "e-g"
Upvotes: 3