Reputation: 48711
I was looking into another question that came with this issue.
I wonder why using \p{L}
results in false
when using PHP >= 5.3.4 but true
in earlier versions?
print_r(preg_match("@^\d+\s+\p{L}+\s+\d+$@", "20 Août 2014"));
\p{L}
should work as expected in PCRE 8.30 to 8.34 as I could test in environments like RegexBuddy:
So of PHP 5.4.14 (PCRE 8.30) to 5.6 (PCRE 8.34) the same result (as I couldn't find any custom change that is made to PHP PCRE bundle) should achieve:
And according to @user1578653 answer, using letter Å with 0xc5 Hexadecimal code would have different outputs, however it won't (!) but it should match.
Upvotes: 3
Views: 114
Reputation: 5028
It seems from the PHP changelog for v 5.3.4 (http://php.net/ChangeLog-5.php) that one of the changes was that they "Upgraded bundled PCRE to version 8.10. (Ilia)".
The changelog for PCRE v8.10 (http://www.pcre.org/changelog.txt) mention several things regarding the \p modifier, specifically points 12 and 15. Perhaps these are related to your problem?
I have done some more tests and I think this is the cause of the difference. Point 15 in the PCRE changelog states that:
If a repeated Unicode property match (e.g. \p{Lu}*) was used with non-UTF-8 input, it could crash or give wrong results if characters with values greater than 0xc0 were present in the subject string. (Detail: it assumed UTF-8 input when processing these items.)
If you try replacing your 'û' character with any character less than unicode 0xc0 you will get the same results on all versions of PHP. If you replace that character with any character equal to or greater than 0xc0 you get the difference between PHP versions that you are seeing. So it must be caused by this update to the PCRE library!
Upvotes: 3