Xiangyu.Wu
Xiangyu.Wu

Reputation: 459

About \b{w} Syntax in Unicode Regular Expression Word Boundary

RL2.2 in UTS #18 describe the syntax \b{w} for Extended Grapheme Clusters of Word Boundary in Unicode regular expression , but I don't quite understand its different from \b syntax. In UTS #18, it says

\b{w}. A Unicode word boundary. Note that this is different than \b alone, which corresponds to \w and \W. See Annex C: Compatibility Properties.

So what is the difference between the two syntax exactly?

Upvotes: 3

Views: 727

Answers (1)

Thomas Ayoub
Thomas Ayoub

Reputation: 29441

\b == \w\W

Where \W = [^A-Za-z0-9_] => it only deals with asci alphanumeric while \b{w} deals with UNICODE charset (ie: a wider alphanumeric set).

You can see an example of an emulation of \b{w} here compared to the usual behavior.

Upvotes: 3

Related Questions