Reputation: 266
the following command returns true on a PHP5.3.8 Lamp(Ubuntu 11.04)-Server, but false on a PHP5.3.2 Lamp(Ubuntu 10.04.2 LTS)-Server.
<?php echo preg_match('/\w/u', 'ß'); ?>
I nearly changed all settings in the php.ini-file, but without success. I changed the system locale to en_US.UTF-8 and made it the default locale for PHP. Additionally I tried the de_DE.UTF-8-locale.
In both cases I am using the default-packages provided by ubuntu.
Does anybody has another idea, what to change, without compiling any packages, so that PHP5.3.2 will also return true?
Upvotes: 2
Views: 2403
Reputation: 306
Unicode is not yet fully supported in php
The following code
$url='abc αβγ';
define('CONST_REGEX_SANITIZE_URL', '/[^\040\w\/\.\-\:]/u');
$invalid_url = preg_match(CONST_REGEX_SANITIZE_URL, $url) ? 'true' : 'false';
echo $invalid_url;
return 'false' with php > 5.3.10
and 'true' with php < 5.3.3 (BTW the current Debian php version)
Upvotes: 0
Reputation: 655309
PHP 5.3.2 uses PCRE 8.00 while PHP 5.3.8 uses PCRE 8.11. One change in PCRE 8.10 was the addition of the PCRE_UCP option:
PCRE_UCP
This option changes the way PCRE processes
\B
,\b
,\D
,\d
,\S
,\s
,\W
,\w
, and some of the POSIX character classes. By default, only ASCII characters are recognized, but if PCRE_UCP is set, Unicode properties are used instead to classify characters. More details are given in the section on generic character types in the pcrepattern page. If you set PCRE_UCP, matching one of the items it affects takes much longer. The option is available only if PCRE has been compiled with Unicode property support.
Unfortunately, you can’t trigger this option directly with a pattern modifier in PHP. It will be set by u
together with PCRE_UTF8 when available (PHP 5.3.4 and later).
Upvotes: 6