regex explained in english

Question

I have looked here and from what I understand the following regex simply means "any unicode character sequence". Can someone confirm this please?

Current Regex: /^(?>\P{M}\p{M}*)+$/u

Also if I read the manual it says

a) \P{M} = \PM

b) (?>\PM\pM*) = \X

So with these two things in hand, can I not simplify the regex to?:

Proposed Regex: /^\X+$/u

Which I still don't actually understand...

Bart Kiers · Accepted Answer

Yes, \P{M}\p{M}* could be simplified to \X, but not all languages support \X while (in my experience) \P{M} and \p{M} are supported more frequently.

For example, Java's and .NET's regex engines do not support \X (Perl does, of course...).

More info, see: http://www.regular-expressions.info/unicode.html

regex explained in english

Answers (2)

Related Questions