Real Dreams
Real Dreams

Reputation: 18010

strange behavior of preg_match_all()

Following code:

    $string ='۱۲۳۴۵۶۷۸۹۰';
    $regex ='@۱@';
    preg_match_all($regex,$string,$match);
    var_dump($match);

will output:

    array(1) {
      [0] =>
      array(1) {
        [0] =>
        string(2) "۱"
      }
    }

but

    $regex2 ='@[۱]@';
    preg_match_all($regex2,$string,$match);
    var_dump($match);

will output

  array (size=1)
  0 => 
    array (size=11)
      0 => string '�' (length=1)
      1 => string '�' (length=1)
      2 => string '�' (length=1)
      3 => string '�' (length=1)
      4 => string '�' (length=1)
      5 => string '�' (length=1)
      6 => string '�' (length=1)
      7 => string '�' (length=1)
      8 => string '�' (length=1)
      9 => string '�' (length=1)
     10 => string '�' (length=1)

Indeed I want use RegEx like [۱۲۳۴۵۶۷۸۹۰]‍‍‍‍‍‍, but the function output strange result with such RegEx's. I am using PHP 5.4

Upvotes: 0

Views: 54

Answers (1)

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324630

Try adding the Unicode flag:

$regex = '@[۱]@u';

The reason for this is because ۱ is actually several bytes long. On it's own, it's harmless because those exact bytes are either the symbol, or the individual bytes being there coincidentally. However, in a character class any of the individual bytes may match any of the individual bytes in the other characters, which is does because they are close together in the map.

Upvotes: 2

Related Questions