Creating a function to remove only specific word in a list (R)

Question

I have a list with undesirable words (in spanish) which are meaningless, but they are also present inside another. I just want to remove it when they are a term, not when they are a piece of another word.

For example: "la" is an spanish article, but if I use a function to remove it, also will break into two words a useful term like "relacion" (which means relationship)

My first choice was creating a function to remove this terms.

bdtidy$tweet <- #here are tweets
fix.useless <- function(doc) {   
function(doc) {
doc <- gsub("la", ".", doc)
doc <- gsub("las", ".", doc)
doc <- gsub("el", ".", doc)
doc <- gsub("ellos", ".", doc)
doc <- gsub("ellas", ".", doc)
return(doc)
 }

bdtidy$tweet <- sapply(bdtidy$tweet, fix.useless)

My second choice was with a list, and then using filter inside the df

nousar <- c("rt", "pero", "para"...)
new df %>% bdtidy %>%
 filter(!tweet $in$ nousar))

But always the result is removing all those words and breaking terms in two words which makes my analysis useless. Thanks.

xilliam · Accepted Answer

One way to remove single words from a string is by flanking the words with spaces, such as this example:

# sample input
x <- c("Get rid of la but not lala")
# pattern with spaces flanking target word
y <- gsub(" la ", " ", x)
# output
> y
[1] "Get rid of but not lala"

Creating a function to remove only specific word in a list (R)

Answers (2)

Related Questions