john
john

Reputation: 1036

Replace words in R

I have words against their synonyms. In the different data frame, I have sentences. I want to search synonyms from the other dataframe. If found, replace it with word for which synomym found.

dt = read.table(header = TRUE, 
text ="Word Synonyms
Use 'employ, utilize, exhaust, spend, expend, consume, exercise'
Come    'advance, approach, arrive, near, reach'
Go  'depart, disappear, fade, move, proceed, recede, travel'
Run 'dash, escape, elope, flee, hasten, hurry, race, rush, speed, sprint'
Hurry   'rush, run, speed, race, hasten, urge, accelerate, bustle'
Hide    'conceal, cover, mask, cloak, camouflage, screen, shroud, veil'
", stringsAsFactors= F)


   mydf = read.table(header = TRUE, , stringsAsFactors= F,
                    text ="sentence
    'I can utilize this file'
    'I can cover these things'
    ")

The desired output looks like -

I can Use this file
I can Hide these things

Above is just a sample. In my real dataset, I have more than 10000 sentences.

Upvotes: 1

Views: 3190

Answers (2)

MKR
MKR

Reputation: 20095

One can replace , in dt$Synonyms with | so that it can be used as pattern argument of gsub. Now, use dt$Synonyms as pattern and replace occurrence of any word (separated by |) with dt$word. One can use sapply and gsub as:

Edited: Added word-boundary check (as part of pattern in gsub) as suggested by OP.

# First replace `, ` with `|` in dt$Synonyms. Now dt$Synonyms can be
# used 'pattern' argument of `gsub`.
dt$Synonyms <- paste("\\b",gsub(", ","\\\\b|\\\\b",dt$Synonyms),"\\b", sep = "")

# Loop through each row of 'dt' to replace Synonyms with word using sapply
mydf$sentence <- sapply(mydf$sentence, function(x){
  for(row in 1:nrow(dt)){
    x = gsub(dt$Synonyms[row],dt$Word[row], x)
  }
  x
})

mydf
#                  sentence
# 1     I can Use this file
# 2 I can Hide these things

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18435

Here is a tidyverse solution...

library(stringr)
library(dplyr)

dt2 <- dt %>% 
  mutate(Synonyms=str_split(Synonyms, ",\\s*")) %>% #split into words
  unnest(Synonyms) #this results in a long dataframe of words and synonyms

mydf2 <- mydf %>% 
  mutate(Synonyms=str_split(sentence, "\\s+")) %>% #split into words
  unnest(Synonyms) %>% #expand to long form, one word per row
  left_join(dt2) %>% #match synonyms
  mutate(Word=ifelse(is.na(Word), Synonyms, Word)) %>% #keep unmatched words the same
  group_by(sentence) %>% 
  summarise(sentence2=paste(Word, collapse=" ")) #reconstruct sentences

mydf2

  sentence                 sentence2              
  <chr>                    <chr>                  
1 I can cover these things I can Hide these things
2 I can utilize this file  I can Use this file   

Upvotes: 1

Related Questions