Reputation: 63
I'm trying to write a regex expression (under R) that matches all the words containing 3 letters in this text:
tex= "As you are now so once were we"
My first attempt is to select words containing 3 letters surrounded by spaces:
matches=str_match_all(tex," [a-z]{3} ")
It's supposed to match " you ", " are " and " now ". But, since some of these spaces are shared between the matched strings, I only get " you " and " now ".
Is there a way to fix this issue ?
Thanks in advance
Upvotes: 1
Views: 258
Reputation: 849
tex= "As you are now so once were we"
Base R function
regmatches(tex , gregexpr('\\b[a-z]{3}\\b' , tex))[[1]]
[1] "you" "are" "now"
Upvotes: 0
Reputation: 109
Try this:
\b[a-zA-Z]{3}\b
This works because \b
doesn't match the whitespace/punctuation itself, but rather the position of the word boundary, so the spaces are not included in the match.
You also want to include A-Z in the character range to include uppercase letters.
This was taken from the examples in http://regexr.com/, they have a "4 letter words" example.
Upvotes: -1
Reputation: 887008
It may be better to use a word boundary (\\b
)
library(stringr)
str_match_all(tex,"\\b[a-z]{3}\\b")[[1]]
# [,1]
#[1,] "you"
#[2,] "are"
#[3,] "now"
Or we can also use str_extract
str_extract_all(tex,"\\b[a-z]{3}\\b")[[1]]
#[1] "you" "are" "now"
Upvotes: 3