Reputation: 459
RL2.2 in UTS #18 describe the syntax \b{w}
for Extended Grapheme Clusters of Word Boundary in Unicode regular expression , but I don't quite understand its different from \b
syntax. In UTS #18, it says
\b{w}. A Unicode word boundary. Note that this is different than \b alone, which corresponds to \w and \W. See Annex C: Compatibility Properties.
So what is the difference between the two syntax exactly?
Upvotes: 3
Views: 727
Reputation: 29441
\b == \w\W
Where \W = [^A-Za-z0-9_]
=> it only deals with asci alphanumeric while \b{w}
deals with UNICODE charset (ie: a wider alphanumeric set).
You can see an example of an emulation of \b{w} here compared to the usual behavior.
Upvotes: 3