Reputation: 1872
I have a hard time understanding why ((?i)\bb.*?\b)
returns b
and not b-
for the string a b- c
. I also tried ((?i)\bb\w*\b)
, but that does not work any better.
Some more info:
I need to match words in a text. I need to retrieve all words that start with the letter b
. And 'words' means pretty much any character string that starts with a b
, such as b
, b-
, b'
, b"
etc. The 'words' I need to match are not of course limited with a space such as in the example.
Upvotes: 1
Views: 108
Reputation: 25810
*
is called a "greedy" quantifier. It'll match as many iterations of the preceding pattern as possible. Most of the time, this is exactly what you want, but sometimes you want to use a "lazy" quantifier, meaning it'll match as few as possible, including 0.
To make a quantifier "lazy", you add a question mark: *?
, +?
, ??
, etc.
Now, the next part of the answer is how word boundaries work. Word boundaries will match a position where there's a "break" between "word characters" (0-9, a-z and _) and "non-word characters". -
is a non-word character, so the positions between b-
, -c
and c
would all work.
Because you've got a lazy quantifier and there's a word boundary immediately after the b
, that's all that your regex will match.
Rather than trying to use a word boundary to find the end of your word, just match word characters and dashes, like so, which will naturally match everything to the "end" of the word:
\bb[-\w]*
See a working example
Upvotes: 1
Reputation: 2142
This should give you the desired result:
(b.*?)(?:\s|$)
I've tested it on a b- c bfdf b32=" dfa b. b---s asd b
.
It seems like you're not looking for words but any string starting with a letter "b" delimited by a space (or other?) character(s). Your original pattern can't work because "-" doesn't qualify as part of a word. Good luck.
Note: Above pattern is very simple, the last part with $ is there so that the last "b" is captured which is on the end of the line.
Upvotes: 1
Reputation: 283
.*?
is minimal, so b.*?\b
finds the first word boundary after the b
. Since b
is a word character, and -
is not, that first word boundary is between those characters.
ETA: Thing is, regexen don't consider your 'words' to be words, so \b
won't work for them. You say your 'words' don't always end with a space. And, obviously, they won't end with a hyphen. How, more precisely do they end?
Upvotes: 0