Reputation: 184
We have a problem where a functions gives different results depending on the server we are running it on.
The function is the following:
<?php
$s='校';
preg_match_all( '/".*?("|$)|((?<=[\\s",+])|^)[^\\s",+]+/', $s, $matches );
Results are varying depending on the environment:
WAMP (php 5.5.12 PCRE 8.34) and
LAMP (php 5.3.3 PCRE 7.8) environments both give the same result
array (size=3)
0 =>
array (size=1)
0 => string '校' (length=3)
1 =>
array (size=1)
0 => string '' (length=0)
2 =>
array (size=1)
0 => string '' (length=0)
WS2008 IIS7 (php 5.4.24 PCRE 8.32)
array(3) {
[0]=> array(2) {
[0]=> string(1) "�"
[1]=> string(1) "�"
}
[1]=> array(2) {
[0]=> string(0) ""
[1]=> string(0) ""
}
[2]=> array(2) {
[0]=> string(0) ""
[1]=> string(0) ""
}
}
Now, the really weird thing is that with a lot of different Japanese characters, the results will be correct on all environments. Right now the only time we could reproduce this issue was with this '校' character. Whether it is accompanied ($s='校正' for example) or alone, the result will always be different on IIS with what looks like an encoding problem '�'.
I first tried to look into php version and PCRE version, but both are older versions on our LAMP so I thought the problem may be somewhere else...
Regards
Upvotes: 1
Views: 194
Reputation: 626920
When dealing with Unicode strings you need to pass the /u
modifier with the pattern.
Use
'/".*?("|$)|((?<=[\s",+])|^)[^\s",+]+/u'.
Also, you should note that inside a single quoted literal, you do not need to use double backslashes with \\s
, use a single backslash.
Upvotes: 1