user16948
user16948

Reputation: 4951

Difference between these two regular expressions?

What's the difference between these two regular expressions?(using php preg_match())

/^[0-9\x{06F0}-\x{06F9}]{1,}$/u

/^[0-9\x{06F0}-\x{06F9}\x]{1,}$/u

What's the meaning of the last \x in the second pattern?

Upvotes: 0

Views: 225

Answers (5)

dpk2442
dpk2442

Reputation: 701

As far as I can tell, the second \x is actually an invalid character. Do both expressions work?

Upvotes: 0

user557597
user557597

Reputation:

This is weird. Php notation for a unicode character is \x{}. In perl, it is the same thing.

But php has the //u modifier in regex's. I asume that means unicode. No such modifier in perl.

In perl regex, \x## is parsed, where ## is required to denote an ascii character. If its \x or \x#, its a warning of illeagal hex digit ignored (because it requires 2 digits, no more no less) and it takes only the valid hex digits in the sequence. If you have no digits as in \x, it uses \0 ascii char etc..

However, any \x{} notation is ok, and \x{0} is equivalend to \x{}. And \x{0}-\x{ff} is considered ascii, \x{100}- is considered unicode.

So, \x is a valid hex/unicode escape sequence but by itself its asumed hex and is incomplete and probably not something that should be left to parser default mechanisms.

Upvotes: 0

tomraithel
tomraithel

Reputation: 968

I think the second pattern is not valid.

According to this page http://www.regular-expressions.info/unicode.html, the \x is only useful followed by the unicode number:

Since \x by itself is not a valid regex token, \x{1234} can never be confused to match \x 1234 times.

Upvotes: 0

hpekristiansen
hpekristiansen

Reputation: 1070

http://www.regular-expressions.info/unicode.html

...Since \x by itself is not a valid regex token...

Upvotes: 1

hobbs
hobbs

Reputation: 239890

It's interpreted as \x00 (the null character) but it's almost certainly a bug caused by sloppy editing or copy and paste.

Upvotes: 4

Related Questions