Extract any unicode string occurence within a string using preg_match

Question

I have this kind of string

sample İletişim form:: aşağıdaki formu

What I'm aiming is to extract the string that has a unicode/non-ascii character inside of it using preg_match or preg_match_all of php.

So I'm expecting a result of 2 İletişim and aşağıdaki word only.

Array
(
    [0] => İletişim 
    [1] => aşağıdaki
)

I just can't think of regular expression as I'm not good at it. Any aid is welcome.

Thank you so much.

Lebugg · Accepted Answer

By using preg_match(), you could do smthg like this:

preg_match_all('/[^\s]*[^\x20-\x7f]+[^\s]*/', $string, $matches);
print_r($matches);

Or, without preg_match, you can use the function mb_detect_encoding() to test the encoding of the string. In your case, you could use it this way:

$matches = array_filter(explode(' ', $string), function($item) {
    return !mb_detect_encoding($item, 'ASCII', TRUE);
});
print_r($matches);

But the last one is a bit warped ^^

Answers (2)