TheTrueTDF
TheTrueTDF

Reputation: 184

php preg_match_all different result on IIS only with specific japanese character

We have a problem where a functions gives different results depending on the server we are running it on.

The function is the following:

<?php
$s='校';
preg_match_all( '/".*?("|$)|((?<=[\\s",+])|^)[^\\s",+]+/', $s, $matches );

Results are varying depending on the environment:

WAMP (php 5.5.12 PCRE 8.34) and

LAMP (php 5.3.3 PCRE 7.8) environments both give the same result

array (size=3)
  0 => 
    array (size=1)
      0 => string '校' (length=3)
  1 => 
    array (size=1)
      0 => string '' (length=0)
  2 => 
    array (size=1)
      0 => string '' (length=0)

WS2008 IIS7 (php 5.4.24 PCRE 8.32)

array(3) { 
    [0]=> array(2) { 
        [0]=> string(1) "�" 
        [1]=> string(1) "�" 
    } 
    [1]=> array(2) { 
        [0]=> string(0) "" 
        [1]=> string(0) "" 
    }
    [2]=> array(2) { 
        [0]=> string(0) "" 
        [1]=> string(0) "" 
    }
}

Now, the really weird thing is that with a lot of different Japanese characters, the results will be correct on all environments. Right now the only time we could reproduce this issue was with this '校' character. Whether it is accompanied ($s='校正' for example) or alone, the result will always be different on IIS with what looks like an encoding problem '�'.

I first tried to look into php version and PCRE version, but both are older versions on our LAMP so I thought the problem may be somewhere else...

Regards

Upvotes: 1

Views: 194

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626920

When dealing with Unicode strings you need to pass the /u modifier with the pattern.

Use

'/".*?("|$)|((?<=[\s",+])|^)[^\s",+]+/u'.

Also, you should note that inside a single quoted literal, you do not need to use double backslashes with \\s, use a single backslash.

Upvotes: 1

Related Questions