Alexander Solonik
Alexander Solonik

Reputation: 10230

Understanding the usage of \b in regex that matches multiple strings

I just found the below regex online while browsing:

(?:^|\b)(bitcoin atm|new text|bitcoin|test a|test)(?!\w)

I was just curious to know what is the advantage of using (?:^|\b) here ?

I understand that basically (?:) means it a non capturing group but I am a bit stumped by ^|\b in this particular parenthesis, here I understand that ^ basically means asset start of string.

The examples of \b on MDN gave me a fair understanding of what \b does, but I am still not able to put things into context based on the example I have provided.

Upvotes: 0

Views: 99

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

The (?:^|\b) is a non-capturing group that contains 2 alternatives both of which are zero-width assertions. That means, they just match locations in a string, and thus do not affect the text you get in the match.

Besides, as the next subpattern matches b, n or t as the first char (a word char) the \b (a word boundary) in the first non-capturing group will also match them in the beginning of a string, making ^ (start of string anchor) alternative branch redundant here.

Thus, you may safely use

\b(bitcoin atm|new text|bitcoin|test a|test)(?!\w)

and even

\b(bitcoin atm|new text|bitcoin|test a|test)\b

since the alternatives end with a word char here.

If the alternatives in the (bitcoin atm|new text|bitcoin|test a|test) group are user-defined, dynamic, and can start or end with a non-word char, then the (?:^|\b) and (?!\w) regex patterns makes sense, but it would not be prcise then, as (?:^|\b)\.txt(?!\w) will not match .txt as a whole word, it should be preceded with a word char then. I would use (?:^|\W) rather than (?:^|\b).

Upvotes: 2

Related Questions