PHP find a word in a Unicode string

Question

I am searching for the string version in text read from a Unicode little-endian file.

With the $text 'version (apostrophe intended) I get

echo strpos($text, "r");          // Returns 7.
echo strpos($text, "version");    // Returns null.

I suspect that I need to convert either the needle or the haystack into the same format.

I had a look at mb_strpos but it doesn't do text searches in the same way as strpos.
I also considered changing by needle string to UTF-8 but haven't tried it yet. It seems a bit messy.

Any ideas?

Update after cmbuckley's answer.

$var = iconv('UTF-16LE', 'UTF-8', $fields[0]); 
// Returns Notice: iconv(): Detected an incomplete multibyte character in ...input string in

So I checked the existing encoding and find

echo mb_detect_encoding($fields[0], mb_detect_order(), false);  // Returns 'ASCII'.

This is confusing. If the string is ASCII why was I having trouble with the original strpos function?

Update 2

The hex encoding of 'version is 2700 5600 6500 7200 7300 6900 6f00 6e00.

What encoding is that?

Marco · Accepted Answer

I created a file with the hex contents you provided and managed to find a solution:



Contents of test (viewed in Hex Fiend):



Version of PHP used: PHP 5.6.36

Answers (2)