kamashay
kamashay

Reputation: 107

R - sequentially replace string using a data frame of strings

I'm trying to build a function F that replace a target string 'str' in a data frame of stings 'df', column by column, row by row, according to the column name as the sub-string to be replaced, and column value as replacements. result is a string-vector length 'rownum' of replaced strings with 'colnum' replacements for each string as the output.

an example would illustrate it best:

str <- "Hi, I am name and I am age years old! - said name "

df <- data.frame(name = c('John', 'Richard','Edward'), age =c('10','26','12'))

F(str,df)

"Hi, I am John and I am 10 years old! - said John "

"Hi, I am Richard and I am 26 years old! - said Richard "

"Hi, I am Edward and I am 12 years old! - said Edward "

I have written a function for the job:

F <- function(str,df)
{
  x <- str
  for(i in names(df)){
    x <- unname(mapply(gsub,i,df[[i]],x))
  }
  return(x)
}

and it seems to work, but I'm under the impression that it is not efficient nor elegant.

  1. is there a way to avoid a loop?
  2. is mapply a necessity?
  3. can F work when 'str' is a text of multiple lines, and not just a single line?

thanks for your help

Upvotes: 1

Views: 277

Answers (4)

h3rm4n
h3rm4n

Reputation: 4187

The most straightforward approach (as presented by @RomanLustrik in the comments):

str <- "Hi, I am %s and I am %s years old! - said %s "
sprintf(str, df$name, df$age, df$name)

The result:

[1] "Hi, I am John and I am 10 years old! - said John "      
[2] "Hi, I am Richard and I am 26 years old! - said Richard "
[3] "Hi, I am Edward and I am 12 years old! - said Edward "  

Upvotes: 1

lukeA
lukeA

Reputation: 54237

Maybe another option, which "hides" the for loop:

library(stringi)
f <- function(str, df) 
  apply(df, 1, stri_replace_all, str=str, fixed=names(df), merge=T, vec=F)  
f("Hi, I am name and I am age years old! - said name ", df)
# [1] "Hi, I am John and I am 10 years old! - said John "      
# [2] "Hi, I am Richard and I am 26 years old! - said Richard "
# [3] "Hi, I am Edward and I am 12 years old! - said Edward "

str <- "Hi, I am name and I am age years old! - said name\n
Hi, I am name and I am age years old! - said name"
f(str, df)
# [1] "Hi, I am John and I am 10 years old! - said John\n\nHi, I am John and I am 10 years old! - said John"            
# [2] "Hi, I am Richard and I am 26 years old! - said Richard\n\nHi, I am Richard and I am 26 years old! - said Richard"
# [3] "Hi, I am Edward and I am 12 years old! - said Edward\n\nHi, I am Edward and I am 12 years old! - said Edward"

Upvotes: 2

Rentrop
Rentrop

Reputation: 21497

Mustache is a great solution for this kind of string manipulations via templates. For simple strings/templates i would go with sprintf as well. For more complex templates i would definitely use Mustache.

The R-implementation of Mustache is the whisker-package

In your case this could be done e.g. via:

#install.packages("whisker")
library(whisker)
template <- 
"Hi, I am {{name}} and I am {{age}} years old! - 
said {{name}}"

df <- data.frame(name = c('John', 'Richard','Edward'), age =c('10','26','12'))

out <- apply(df, 1, function(x) whisker.render(template, x))

which gives you:

[1] "Hi, I am John and I am 10 years old! -\nsaid John"      
[2] "Hi, I am Richard and I am 26 years old! -\nsaid Richard"
[3] "Hi, I am Edward and I am 12 years old! -\nsaid Edward" 

The linebreak (\n) is present is the output.

You can also use readLines to initially read your template instead of hardcoding it in the code.

Upvotes: 1

akrun
akrun

Reputation: 887048

We can do this programmatically (inspired from @RomanLustrik's idea

do.call(sprintf, c(cbind(df, name2=df$name), fmt = gsub("name|age", "%s", str)))
#[1] "Hi, I am John and I am 10 years old! - said John "    
#[2] "Hi, I am Richard and I am 26 years old! - said Richard "
#[3] "Hi, I am Edward and I am 12 years old! - said Edward "  

Upvotes: 0

Related Questions