Reputation: 4951
What's the difference between these two regular expressions?(using php preg_match())
/^[0-9\x{06F0}-\x{06F9}]{1,}$/u
/^[0-9\x{06F0}-\x{06F9}\x]{1,}$/u
What's the meaning of the last \x
in the second pattern?
Upvotes: 0
Views: 225
Reputation: 701
As far as I can tell, the second \x
is actually an invalid character. Do both expressions work?
Upvotes: 0
Reputation:
This is weird. Php notation for a unicode character is \x{}. In perl, it is the same thing.
But php has the //u modifier in regex's. I asume that means unicode. No such modifier in perl.
In perl regex, \x## is parsed, where ## is required to denote an ascii character. If its \x or \x#, its a warning of illeagal hex digit ignored (because it requires 2 digits, no more no less) and it takes only the valid hex digits in the sequence. If you have no digits as in \x, it uses \0 ascii char etc..
However, any \x{} notation is ok, and \x{0} is equivalend to \x{}. And \x{0}-\x{ff} is considered ascii, \x{100}- is considered unicode.
So, \x is a valid hex/unicode escape sequence but by itself its asumed hex and is incomplete and probably not something that should be left to parser default mechanisms.
Upvotes: 0
Reputation: 968
I think the second pattern is not valid.
According to this page http://www.regular-expressions.info/unicode.html, the \x is only useful followed by the unicode number:
Since \x by itself is not a valid regex token, \x{1234} can never be confused to match \x 1234 times.
Upvotes: 0
Reputation: 1070
http://www.regular-expressions.info/unicode.html
...Since \x by itself is not a valid regex token...
Upvotes: 1
Reputation: 239890
It's interpreted as \x00
(the null character) but it's almost certainly a bug caused by sloppy editing or copy and paste.
Upvotes: 4