Analytics_Lover
Analytics_Lover

Reputation: 1

In R how to match and extract specific word from the text, irrespective of the formation?

I have the following sample text and I want to extract only Machine Learning from the text.

text <- c("Machine Learning is my fav topic.", "I love machinelearning.")

ifelse((found <- regexpr("\\sMachine Learning", text, 
                     perl =TRUE)) !=-1, substring(text, found, 
                          found+attr(found,"match.length")), "nothing found")

But it is returning me..

"nothing found" "nothing found"

I must get result as:

"Machine Learning", "machinelearning"

Upvotes: 0

Views: 48

Answers (2)

suhao399
suhao399

Reputation: 648

I have 2 points, please see below:

1) When you want to search for both phrase you mention, you should use the expression as "machine\s?learning". The ? after \s will ignore the space.

2) Use regexpr to find the match then use regmatches() function to extract the text.

> text <- c("Machine Learning is my fav topic.", "I love machinelearning.")
> m <- regexpr("machine\\s?learning", text, perl=TRUE,  ignore.case = TRUE)
> regmatches (text, m)

[1] "Machine Learning" "machinelearning" 

Upvotes: 1

akrun
akrun

Reputation: 887971

The (?i) makes the regex case insensitive. Use the pattern 'Machine' followed by zero or more space (\\s*) followed by 'Learning'

library(stringr)
unlist(str_extract_all(text, "(?i)Machine\\s*Learning"))
#[1] "Machine Learning" "machinelearning" 

Upvotes: 1

Related Questions