Reputation: 11295
PHP Regex find all capitalize words in string:
$string = "test sample test: 2015. ŽYDRŪNAS PAVARDENIS";
preg_match_all('/\b([A-Z-][\p{L}\pL]+)\b/', $string, $matches);
var_dump($matches);
Output:
array(2) {
[0]=>
array(2) {
[0]=>
string(8) "YDRŪNAS"
[1]=>
string(10) "PAVARDENIS"
}
[1]=>
array(2) {
[0]=>
string(8) "YDRŪNAS"
[1]=>
string(10) "PAVARDENIS"
}
}
Question is where disapear symbol 'Ž
' ?
HOw to modify regex
expresion, that will be not removed UTF-8
symbols ?
Code online: Code
Upvotes: 3
Views: 1062
Reputation: 158280
Basically you need to use the modifier u
option when working with unicode strings. However the regex can also get simplified using the :upper:
character class because it will match all uppercased unicode characters.
Like this:
$string = "test sample test: 2015. ŽYDRŪNAS PAVARDENIS";
preg_match_all("/[[:upper:]]+/u", $string, $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
array(2) {
[0]=>
string(10) "ŽYDRŪNAS"
[1]=>
string(10) "PAVARDENIS"
}
}
Upvotes: 5