Reputation: 1816
I have this kind of string
sample İletişim form:: aşağıdaki formu
What I'm aiming is to extract the string that has a unicode/non-ascii character inside of it using preg_match or preg_match_all of php.
So I'm expecting a result of 2 İletişim and aşağıdaki word only.
Array
(
[0] => İletişim
[1] => aşağıdaki
)
I just can't think of regular expression as I'm not good at it. Any aid is welcome.
Thank you so much.
Upvotes: 0
Views: 305
Reputation: 313
I think a beginning of solution you want is here: How do I detect non-ASCII characters in a string?
By using preg_match(), you could do smthg like this:
preg_match_all('/[^\s]*[^\x20-\x7f]+[^\s]*/', $string, $matches);
print_r($matches);
Or, without preg_match, you can use the function mb_detect_encoding() to test the encoding of the string. In your case, you could use it this way:
$matches = array_filter(explode(' ', $string), function($item) {
return !mb_detect_encoding($item, 'ASCII', TRUE);
});
print_r($matches);
But the last one is a bit warped ^^
Upvotes: 1
Reputation: 91430
You can use unicode properties:
$string = 'sample İletişim form:: aşağıdaki formu';
preg_match_all("/(\pL+)/u", $string, $matches);
print_r($matches);
output:
Array
(
[0] => Array
(
[0] => sample
[1] => İletişim
[2] => form
[3] => aşağıdaki
[4] => formu
)
[1] => Array
(
[0] => sample
[1] => İletişim
[2] => form
[3] => aşağıdaki
[4] => formu
)
)
Upvotes: 1