Hanif
Hanif

Reputation: 397

Replacing multiple numbers with string in a dataframe without regex in R

I have columns in a dataframe where I want to replace integers with their corresponding string values. The integers are often repeating in cells (separated by spaces, commas, /, or - etc.). For example my dataframe column is:

> df = data.frame(c1=c(1,2,3,23,c('11,21'),c('13-23')))
> df

     c1
1     1
2     2
3     3
4    23
5 11,21
6 13-23

I have used both str_replace_all() and str_replace() methods but did not get the desired results.

> df[,1] %>% str_replace_all(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))

[1] "a"     "b"     "c"     "bc"    "aa,ba" "ac-bc"
> df[,1] %>% str_replace(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))

Error in fix_replacement(replacement) : argument "replacement" is missing, with no default

The desired result would be:

[1] "a"     "b"     "c"     "g"    "d,f" "e-g"

As there are multiple values to replace that's why my first choice was str_replace_all() as it allows to have a vector with the original column values and desired replacement values but the method fails due to regex. Am I doing it wrong or is there any better alternative to solve my problem?

Upvotes: 1

Views: 152

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28705

Using the ordering method in @GKi's answer, here's a base R version using Reduce/gsub instead of stringr::str_replace_all

Starting vector

x <- as.character(df$c1)

Ordering as in @GKi answer

repl_dict <- c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c")
repl_dict <- repl_dict[order(nchar(names(repl_dict)), decreasing = TRUE)]

Replacement

Reduce(
  function(x, n) gsub(n, repl_dict[n], x, fixed = TRUE),
  names(repl_dict),
  init = x)

#  [1] "a"   "b"   "c"   "g"   "d,f" "e-g"

Upvotes: 1

GKi
GKi

Reputation: 39747

Simply place the longest multi-character at the beginning like:

library(stringr)

str_replace_all(df[,1], 
 c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c"))
#[1] "a"   "b"   "c"   "g"   "d,f" "e-g"

and for complexer cases:

x <- c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g")
x <- x[order(nchar(names(x)), decreasing = TRUE)]
str_replace_all(df[,1], x)
#[1] "a"   "b"   "c"   "g"   "d,f" "e-g"

Upvotes: 3

Related Questions