Arno
Arno

Reputation: 375

preg_match returns different symbols than input string

[Resolved] Adding the modifier /u to the regular expression fixes this issue if anyone is struggling with this. Credit to M.I. in the comments :)

Consider the following code:

var_dump('Trimiteţi');
preg_match('/^([\p{L}]+)/', 'Trimiteţi', $matches);
print_r($matches);

I am using it to filter a word that might have non-latin characters using \p{L}, also notice that I don't use the end string $ regular expression symbol in the preg_match

Now to the problem, when executing the code locally I receive this output:

string 'Trimiteţi' (length=10)
Array ( [0] => TrimiteÅ [1] => TrimiteÅ )

I tried executing the code in the PHP sandbox, and it outputs something similar:

string(10) "Trimiteţi"
Array
(
    [0] => Trimite�
    [1] => Trimite�
)

Notice that at least this time it didn't ruin the original var_dump word.

What is going on? Why using preg_match changes the word? Worst part about this is, if I add $ to the end of the regular expression, it will NOT MATCH, since I suppose those transformed symbols could not be interpreted as a string end or something. Please, help me

Edit: the file encoding that I'm running is set to "text/x-php; charset=utf-8"
Edit2: Additionally, I used regex101.com, and when using REGULAR EXPRESSION "^[\p{L}]+$" and word "Trimiteţi" it seems to match. You can even switch the REGULAR EXPRESSION TO "^([\p{L}]+)$", adding the capturing group, and the site outputs:

MATCH 1
1.  [0-9]   `Trimiteţi`

Upvotes: 1

Views: 199

Answers (0)

Related Questions