user11015000
user11015000

Reputation: 159

combining words in tm R is not achieving desired result

I am trying to combine a few words so that they count as one. In this example I want val and valuatin to be counted as valuation.

The code I have been using to try and do this is below:

#load in package
library(tm)

replaceWords <- function(x, from, keep){
  regex_pat <- paste(from, collapse = "|")
  gsub(regex_pat, keep, x)
}


oldwords <- c("val", "valuati")
newword  <- c("valuation")

TextDoc2 <- tm_map(TextDoc, replaceWords, from=oldwords, keep=newword)

However this does not work as expected. Any time there is val in a word it is now being replaced with valuation. For example equivalent becomes equivaluation. How do I get around this error and achieved my desired result?

Upvotes: 2

Views: 44

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389235

Try this function -

replaceWords <- function(x, from, keep){
  regex_pat <- sprintf('\\b(%s)\\b', paste(from, collapse = '|'))
  gsub(regex_pat, keep, x)
}

val matches with equivalent. Adding word boundaries stop that from happening.

grepl('val', 'equivalent')
#[1] TRUE
grepl('\\bval\\b', 'equivalent')
#[1] FALSE

Upvotes: 3

Related Questions