Reputation: 87
I am trying to replace occurances of string in data.frame of strings by another strings from another data.frame of strings.
Multiple base strings where substrings should be replaced
# base strings which I want to replace
base <- data.frame(cmd = rep("this is my example <repl1> and here second <repl2> ...", nrow(repl1)))
Replacement strings
# definition of replacement strings
repl1 <- data.frame(as.character(1:10))
repl2 <- data.frame(as.character(10:1))
I tried to iterate over the data.frame with lapply...
# what I have tried
lapply(base, function(x) {gsub("<repl1>", repl1, x)})
As result I have than following ...
[1] "this is my example c(1, 3, 4, 5, 6, 7, 8, 9, 10, 2) and here second <repl2> ..."
[2] "this is my example c(1, 3, 4, 5, 6, 7, 8, 9, 10, 2) and here second <repl2> ..."
[3] "this is my example c(1, 3, 4, 5, 6, 7, 8, 9, 10, 2) and here second <repl2> ..."
but I would like to achieve ...
[1] "this is my example 1 and here second 10 ..."
[2] "this is my example 2 and here second 9 ..."
[3] "this is my example 3 and here second 8 ..."
Thx for each suggestion :)
Upvotes: 1
Views: 902
Reputation: 79238
Well we can use the vectorized regmatches
function here. That will remove all the loops:
First since your replacements are in different dataframes, combine them together:
repl3 <- cbind(A=repl1,B=repl2)
We have one more problem. The way you created your dataframe, the characters are in class factor
. So I will just change that:
s <- as.character(base$cmd)
From here on we replace directly:
regmatches(s,gregexpr("<repl1>|<repl2>",s))<- strsplit(do.call(paste,repl3)," ")
s
[1] "this is my example 1 and here second 10 ..."
[2] "this is my example 2 and here second 9 ..."
[3] "this is my example 3 and here second 8 ..."
[4] "this is my example 4 and here second 7 ..."
[5] "this is my example 5 and here second 6 ..."
[6] "this is my example 6 and here second 5 ..."
[7] "this is my example 7 and here second 4 ..."
[8] "this is my example 8 and here second 3 ..."
[9] "this is my example 9 and here second 2 ..."
[10] "this is my example 10 and here second 1 ..."
There is need to use many codes in your data because everytime you created your dataframe, yuu forgot to use the stringsAsFactors=F
option. If you did this then the code would be simple:
v=as.character(base$cmd)
repl4=data.frame(1:10,10:1,stringsAsFactors=F)
regmatches(v,gregexpr("<repl1>|<repl2>",v))<-data.frame(t(repl4))
v
[1] "this is my example 1 and here second 10 ..."
[2] "this is my example 2 and here second 9 ..."
[3] "this is my example 3 and here second 8 ..."
[4] "this is my example 4 and here second 7 ..."
[5] "this is my example 5 and here second 6 ..."
[6] "this is my example 6 and here second 5 ..."
[7] "this is my example 7 and here second 4 ..."
[8] "this is my example 8 and here second 3 ..."
[9] "this is my example 9 and here second 2 ..."
[10] "this is my example 10 and here second 1 ..."
Upvotes: 2
Reputation: 24079
You need to index both the base data frame and repl1 data frame. Your code is passing the entire repl1 data frame to each row of the base data frame.
Try this:
# definition of replacement strings
repl1 <- data.frame(as.character(1:10))
repl2 <- data.frame(as.character(10:1))
# base strings which I want to replace
base <- data.frame(cmd = rep("this is my example <repl1> and here second <repl2> ...", nrow(repl1)))
answer<-sapply(1:nrow(repl1), function(x) {gsub("<repl1>", repl1[x,1], base[x,1])})
Now repeat with answer
and the repl2 data frame
Addition:
An alternative is the str_replace
function in the stringr library:
library(stringr)
answer<-str_replace(base[,1], "<repl1>", as.character(repl1[,1]))
this will most likely be faster than the sapply method.
Upvotes: 1