Reputation: 1
I have the following sample text and I want to extract only Machine Learning from the text.
text <- c("Machine Learning is my fav topic.", "I love machinelearning.")
ifelse((found <- regexpr("\\sMachine Learning", text,
perl =TRUE)) !=-1, substring(text, found,
found+attr(found,"match.length")), "nothing found")
But it is returning me..
"nothing found" "nothing found"
I must get result as:
"Machine Learning", "machinelearning"
Upvotes: 0
Views: 48
Reputation: 648
I have 2 points, please see below:
1) When you want to search for both phrase you mention, you should use the expression as "machine\s?learning". The ? after \s will ignore the space.
2) Use regexpr to find the match then use regmatches() function to extract the text.
> text <- c("Machine Learning is my fav topic.", "I love machinelearning.")
> m <- regexpr("machine\\s?learning", text, perl=TRUE, ignore.case = TRUE)
> regmatches (text, m)
[1] "Machine Learning" "machinelearning"
Upvotes: 1
Reputation: 887971
The (?i)
makes the regex case insensitive. Use the pattern 'Machine' followed by zero or more space (\\s*
) followed by 'Learning'
library(stringr)
unlist(str_extract_all(text, "(?i)Machine\\s*Learning"))
#[1] "Machine Learning" "machinelearning"
Upvotes: 1