Sri
Sri

Reputation: 43

R - How to replace a string from multiple matches (in a data frame)

I need to replace subset of a string with some matches that are stored within a dataframe.

For example -

input_string = "Whats your name and Where're you from"

I need to replace part of this string from a data frame. Say the data frame is

matching <- data.frame(from_word=c("Whats your name", "name", "fro"),
            to_word=c("what is your name","names","froth"))

Output expected is what is your name and Where're you from

Note -

  1. It is to match the maximum string. In this example, name is not matched to names, because name was a part of a bigger match
  2. It has to match whole string and not partial strings. fro of "from" should not match as "froth"

I referred to the below link but somehow could not get this work as intended/described above

Match and replace multiple strings in a vector of text without looping in R

This is my first post here. If I haven't given enough details, kindly let me know

Upvotes: 4

Views: 1006

Answers (3)

ira
ira

Reputation: 2644

Edit

Based on the input from Sri's comment I would suggest using:

library(gsubfn)
# words to be replaced
a <-c("Whats your","Whats your name", "name", "fro")
# their replacements
b <- c("What is yours","what is your name","names","froth")
# named list as an input for gsubfn
replacements <- setNames(as.list(b), a)
# the test string
input_string = "fro Whats your name and Where're name you from to and fro I Whats your"
# match entire words
gsubfn(paste(paste0("\\w*", names(replacements), "\\w*"), collapse = "|"), replacements, input_string)

Original

I would not say this is easier to read than your simple loop, but it might take better care of the overlapping replacements:

# define the sample dataset
input_string = "Whats your name and Where're you from"
matching <- data.frame(from_word=c("Whats your name", "name", "fro", "Where're", "Whats"),
                       to_word=c("what is your name","names","froth", "where are", "Whatsup"))

# load used library
library(gsubfn)

# make sure data is of class character
matching$from_word <- as.character(matching$from_word)
matching$to_word <- as.character(matching$to_word)

# extract the words in the sentence
test <- unlist(str_split(input_string, " "))
# find where individual words from sentence match with the list of replaceble words
test2 <- sapply(paste0("\\b", test, "\\b"), grepl, matching$from_word)
# change rownames to see what is the format of output from the above sapply
rownames(test2) <- matching$from_word
# reorder the data so that largest replacement blocks are at the top
test3 <- test2[order(rowSums(test2), decreasing = TRUE),]
# where the word is already being replaced by larger chunk, do not replace again
test3[apply(test3, 2, cumsum) > 1] <- FALSE

# define the actual pairs of replacement
replacements <- setNames(as.list(as.character(matching[,2])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1]),
                         as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1])

# perform the replacement
gsubfn(paste(as.character(matching[,1])[order(rowSums(test2), decreasing = TRUE)][rowSums(test3) >= 1], collapse = "|"),
       replacements,input_string)

Upvotes: 1

Sri
Sri

Reputation: 43

Was trying out different things and the below code seems to work.

a <-c("Whats your name", "name", "fro")
b <- c("what is your name","names","froth")
c <- c("Whats your name and Where're you from")

for(i in seq_along(a)) c <- gsub(paste0('\\<',a[i],'\\>'), gsub(" ","_",b[i]), c)
c <- gsub("_"," ",c)
c

Took help from the below link Making gsub only replace entire words?

However, I would like to avoid the loop if possible. Can someone please improve this answer, without the loop

Upvotes: 0

Aleksandr
Aleksandr

Reputation: 1914

toreplace =list("x1" = "y1","x2" = "y2", ..., "xn" = "yn")

function have two arguments xi and yi.

xi is pattern (find what),
yi is replacement (replace with).

input_string = "Whats your name and Where're you from"
toreplace<-list("Whats your name" = "what is your name", "names" = "name", "fro" = "froth")
gsubfn(paste(names(toreplace),collapse="|"),toreplace,input_string)

Upvotes: 1

Related Questions