Reputation: 10633
I have looked here and from what I understand the following regex simply means "any unicode character sequence". Can someone confirm this please?
Current Regex: /^(?>\P{M}\p{M}*)+$/u
Also if I read the manual it says
a) \P{M} = \PM
b) (?>\PM\pM*) = \X
So with these two things in hand, can I not simplify the regex to?:
Proposed Regex: /^\X+$/u
Which I still don't actually understand...
Upvotes: 1
Views: 212
Reputation: 170288
Yes, \P{M}\p{M}*
could be simplified to \X
, but not all languages support \X
while (in my experience) \P{M}
and \p{M}
are supported more frequently.
For example, Java's and .NET's regex engines do not support \X
(Perl does, of course...).
More info, see: http://www.regular-expressions.info/unicode.html
Upvotes: 2
Reputation: 20300
^ # start of string followed by
(?> # an independent (non-backtracking) capturing group containing
\P{M} # a single unicode character which is not in the `Mark` category
\p{M}* # 0 or more characters in the `Mark` category
)+ # with this capturing group repeated 1 or more times
$ # the end-of-line
Whereas ^\X+$
contains no capturing group; the \P{M}\p{M}*
is otherwise equivalent.
Upvotes: 2