Blofeld
Blofeld

Reputation: 63

consecutive matches in regex (R)

I'm trying to write a regex expression (under R) that matches all the words containing 3 letters in this text:

tex= "As you are now so once were we"

My first attempt is to select words containing 3 letters surrounded by spaces:

matches=str_match_all(tex," [a-z]{3} ")

It's supposed to match " you ", " are " and " now ". But, since some of these spaces are shared between the matched strings, I only get " you " and " now ".

Is there a way to fix this issue ?

Thanks in advance

Upvotes: 1

Views: 258

Answers (3)

dondapati
dondapati

Reputation: 849

 tex= "As you are now so once were we"

Base R function

regmatches(tex , gregexpr('\\b[a-z]{3}\\b' , tex))[[1]]

 [1] "you" "are" "now"

Upvotes: 0

Idloj
Idloj

Reputation: 109

Try this:

\b[a-zA-Z]{3}\b

This works because \b doesn't match the whitespace/punctuation itself, but rather the position of the word boundary, so the spaces are not included in the match.

You also want to include A-Z in the character range to include uppercase letters.

This was taken from the examples in http://regexr.com/, they have a "4 letter words" example.

Upvotes: -1

akrun
akrun

Reputation: 887008

It may be better to use a word boundary (\\b)

library(stringr)
str_match_all(tex,"\\b[a-z]{3}\\b")[[1]]
#   [,1] 
#[1,] "you"
#[2,] "are"
#[3,] "now"

Or we can also use str_extract

str_extract_all(tex,"\\b[a-z]{3}\\b")[[1]]
#[1] "you" "are" "now"

Upvotes: 3

Related Questions