Stanley
Stanley

Reputation: 2806

R text mining filtering string from text

I was wondering if there's an existing R function that given a text and a list of strings as input, will filter out the matching strings in the list that are found within the text?

For example,

x <- "This is a new way of doing things."
mywords <- c("This is", "new", "not", "maybe", "things.")
filtered_words <- Rfunc(x, mywords)

Then filtered_words will contain "This is", "new" and "things.".

Is there any such function?

Upvotes: 0

Views: 791

Answers (2)

Veera
Veera

Reputation: 869

filterWords = function(x, mywords){
  splitwords = unlist(strsplit(x, split = " "))
  return(splitwords[splitwords%in%mywords])
}

This is one way of approach. However this will not find the the words with two sub words like "this is". But I thought it might give you little more information on what you asked.

Upvotes: 0

akrun
akrun

Reputation: 887088

We can use str_extract_all from library(stringr). The output will be a list, which can be unlisted to convert it to a vector.

library(stringr)
unlist(str_extract_all(x, mywords))
#[1] "This is" "new"     "things."

Upvotes: 1

Related Questions