Reputation: 1036
I have words against their synonyms. In the different data frame, I have sentences. I want to search synonyms from the other dataframe. If found, replace it with word for which synomym found.
dt = read.table(header = TRUE,
text ="Word Synonyms
Use 'employ, utilize, exhaust, spend, expend, consume, exercise'
Come 'advance, approach, arrive, near, reach'
Go 'depart, disappear, fade, move, proceed, recede, travel'
Run 'dash, escape, elope, flee, hasten, hurry, race, rush, speed, sprint'
Hurry 'rush, run, speed, race, hasten, urge, accelerate, bustle'
Hide 'conceal, cover, mask, cloak, camouflage, screen, shroud, veil'
", stringsAsFactors= F)
mydf = read.table(header = TRUE, , stringsAsFactors= F,
text ="sentence
'I can utilize this file'
'I can cover these things'
")
The desired output looks like -
I can Use this file
I can Hide these things
Above is just a sample. In my real dataset, I have more than 10000 sentences.
Upvotes: 1
Views: 3190
Reputation: 20095
One can replace ,
in dt$Synonyms
with |
so that it can be used as pattern
argument of gsub
. Now, use dt$Synonyms
as pattern and replace occurrence of any word (separated by |
) with dt$word
. One can use sapply
and gsub
as:
Edited: Added word-boundary check (as part of pattern in gsub
) as suggested by OP.
# First replace `, ` with `|` in dt$Synonyms. Now dt$Synonyms can be
# used 'pattern' argument of `gsub`.
dt$Synonyms <- paste("\\b",gsub(", ","\\\\b|\\\\b",dt$Synonyms),"\\b", sep = "")
# Loop through each row of 'dt' to replace Synonyms with word using sapply
mydf$sentence <- sapply(mydf$sentence, function(x){
for(row in 1:nrow(dt)){
x = gsub(dt$Synonyms[row],dt$Word[row], x)
}
x
})
mydf
# sentence
# 1 I can Use this file
# 2 I can Hide these things
Upvotes: 2
Reputation: 18435
Here is a tidyverse
solution...
library(stringr)
library(dplyr)
dt2 <- dt %>%
mutate(Synonyms=str_split(Synonyms, ",\\s*")) %>% #split into words
unnest(Synonyms) #this results in a long dataframe of words and synonyms
mydf2 <- mydf %>%
mutate(Synonyms=str_split(sentence, "\\s+")) %>% #split into words
unnest(Synonyms) %>% #expand to long form, one word per row
left_join(dt2) %>% #match synonyms
mutate(Word=ifelse(is.na(Word), Synonyms, Word)) %>% #keep unmatched words the same
group_by(sentence) %>%
summarise(sentence2=paste(Word, collapse=" ")) #reconstruct sentences
mydf2
sentence sentence2
<chr> <chr>
1 I can cover these things I can Hide these things
2 I can utilize this file I can Use this file
Upvotes: 1