Eric
Eric

Reputation: 1389

replace multiple patterns with multiple replacements

I have a string replacement table. I need to apply all the replacement patterns to a target data frame. There can be multiple replacement strings in one cell. targets not in the replacement table are converted to NA. I managed this with nested loops - slow and ugly. I can use some ideas on how to code this better. Thanks. Here is an example:

library(tibble)
#define replacement table
 rt <-tribble(
   ~to.replace, ~replace.with,
   "abc"      , "xyz",
   "def"      , "qwe",
   "lkj"      , "dffg",
   "cvb"      , "mnb"
 )
#create a sample data.frame with some extra strings not in the replacement table
set.seed(1)
df <- data.frame(a = paste0(sample(c(rt$to.replace, "jhg", "ert", "ytr"),10,replace=T)," ; ",
                            sample(c(rt$to.replace, "jhg", "ert", "ytr"),10,replace=T)),
                 b = paste0(sample(c(rt$to.replace, "vfe", "thn", "mjh"),10,replace=T)," ; ",
                            sample(c(rt$to.replace, "vfe", "thn", "mjh"),10,replace=T)))
> df
           a         b
1  def ; def mjh ; cvb
2  lkj ; def def ; vfe
3  jhg ; jhg vfe ; cvb
4  ytr ; lkj abc ; def
5  def ; ert def ; thn
6  ytr ; cvb lkj ; vfe
7  ytr ; ert abc ; thn
8  jhg ; ytr lkj ; abc
9  jhg ; lkj mjh ; thn
10 abc ; ert lkj ; lkj
#  Here is what df is supposed to look like after applying all the replacements
> df
           a            b
1  qwe  ; qwe   NA   ; mnb
2  dffg ; qwe   qwe  ; NA
3  NA   ; NA    NA   ; mnb
4  NA   ; dffg  xyz  ; qwe
5  qwe  ; NA    qwe  ; NA
6  NA   ; mnb   dffg ; NA
7  NA   ; NA    xyz  ; NA
8  NA   ; NA    dffg ; xyz
9  NA   ; dffg  NA   ; NA
10 xyz  ; NA    dffg ; dffg

Upvotes: 3

Views: 1225

Answers (1)

akrun
akrun

Reputation: 886948

One option with base R would be to split the string in each column, then match and replace the values from 'rt'

df[] <- lapply(df, function(x) sapply(strsplit(as.character(x), " ; "), 
        function(y) paste(rt$replace.with[match(y, rt$to.replace)], collapse=' ; ')))
df
#          a           b
#1   qwe ; qwe    NA ; mnb
#2  dffg ; qwe    qwe ; NA
#3     NA ; NA    NA ; mnb
#4   NA ; dffg   xyz ; qwe
#5    qwe ; NA    qwe ; NA
#6    NA ; mnb   dffg ; NA
#7     NA ; NA    xyz ; NA
#8     NA ; NA  dffg ; xyz
#9   NA ; dffg     NA ; NA
#10   xyz ; NA dffg ; dffg

Upvotes: 2

Related Questions