Reputation: 18030
I ran the following script using php.exe
:
preg_replace('#(?:^[^\pL]*)|(?:[^\pL]*$)#u','',$string);
or its equivalent:
preg_replace('#(?:^[^\pL]*|[^\pL]*$)#u','',$string);
If $string="S"
or $string=" ذذ "
it works, if string='ذ'
it yields �
that is incorrect , and if string='ذذ'
PHP crashes.
But it works in 4.4.0 - 4.4.9, 5.0.5 - 5.1.6 versions.
What is wrong ?
<?php
$string='دد';
echo preg_replace('#(?:^[^\pL]*)|(?:[^\pL]*$)#u','',$string);
Output for 5.4.0 - 5.5.0alpha6
Process exited with code 139.
Output for 5.2.0 - 5.3.22, 5.5.0beta1
Output for 4.4.0 - 4.4.9, 5.0.5 - 5.1.6
دد
Output for 4.3.11, 5.0.0 - 5.0.4
Warning: preg_replace(): Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at offset 7 in /in/T3rpV on line 3
Output for 4.3.0 - 4.3.10
Warning: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at offset 7 in /in/T3rpV on line 3
Upvotes: 20
Views: 2486
Reputation: 18030
Lastly, the bug was solved:
Output for 4.4.0 - 4.4.9, 5.0.5 - 5.1.6, 5.5.27 - 5.5.33, 5.6.11 - 7.0.4, hhvm-3.6.1 - 3.12.0 دد
Upvotes: 0
Reputation: 173652
From looking at the expression itself, there are two things that could be improved:
The *
multipliers aren't very useful; why would you want to replace a potentially empty match with an empty string? In fact, running this on my system yields NULL
from the preg_replace()
operation.
The memory groups can be merged together.
This is the code after applying both improvements:
$string = 'ﺫﺫ';
var_dump(preg_replace('#(?:^[^\pL]+|[^\pL]+$)#u', '', $string));
// string(4) "ﺫﺫ"
If you're just looking for a multibyte trim function (supported from 4.3.0 onwards):
$string=' دد';
var_dump(preg_replace('#(?:^\s+|\s+$)#u', '', $string));
Upvotes: 1
Reputation: 1088
Use preg_quote
and you have to properly escape the special character before using it with your regex. For example:
<?php
$string = preg_quote("\دد");
echo preg_replace('#(?:^[^\pL]*)|(?:[^\pL]*$)#u','',$string);
See it in action: http://3v4l.org/LeBXg
More about preg_quote.
Cheers,
Ardy
Upvotes: 0
Reputation: 56
maybe this will help :
these properties are usualy only available if PCRE is compiled with "--enable-unicode-properties"
http://docs.php.net/manual/en/regexp.reference.unicode.php#96479
Upvotes: 3
Reputation:
You can use alternative mb_ereg_replace() function:
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
echo mb_ereg_replace('#(?:^[^\pL]*)|(?:[^\pL]*$)#u','',$string);
Upvotes: 5