Reputation: 375
[Resolved] Adding the modifier /u
to the regular expression fixes this issue if anyone is struggling with this. Credit to M.I. in the comments :)
Consider the following code:
var_dump('Trimiteţi');
preg_match('/^([\p{L}]+)/', 'Trimiteţi', $matches);
print_r($matches);
I am using it to filter a word that might have non-latin characters using \p{L}
, also notice that I don't use the end string $
regular expression symbol in the preg_match
Now to the problem, when executing the code locally I receive this output:
string 'Trimiteţi' (length=10)
Array ( [0] => TrimiteÅ [1] => TrimiteÅ )
I tried executing the code in the PHP sandbox, and it outputs something similar:
string(10) "Trimiteţi"
Array
(
[0] => Trimite�
[1] => Trimite�
)
Notice that at least this time it didn't ruin the original var_dump
word.
What is going on? Why using preg_match changes the word? Worst part about this is, if I add $
to the end of the regular expression, it will NOT MATCH, since I suppose those transformed symbols could not be interpreted as a string end or something. Please, help me
Edit: the file encoding that I'm running is set to "text/x-php; charset=utf-8"
Edit2: Additionally, I used regex101.com, and when using REGULAR EXPRESSION "^[\p{L}]+$" and word "Trimiteţi" it seems to match. You can even switch the REGULAR EXPRESSION TO "^([\p{L}]+)$", adding the capturing group, and the site outputs:
MATCH 1
1. [0-9] `Trimiteţi`
Upvotes: 1
Views: 199