Reputation: 213
I am searching for the string version
in text read from a Unicode little-endian file.
With the $text 'version
(apostrophe intended) I get
echo strpos($text, "r"); // Returns 7.
echo strpos($text, "version"); // Returns null.
I suspect that I need to convert either the needle or the haystack into the same format.
Any ideas?
Update after cmbuckley's answer.
$var = iconv('UTF-16LE', 'UTF-8', $fields[0]);
// Returns Notice: iconv(): Detected an incomplete multibyte character in ...input string in
So I checked the existing encoding and find
echo mb_detect_encoding($fields[0], mb_detect_order(), false); // Returns 'ASCII'.
This is confusing. If the string is ASCII why was I having trouble with the original strpos
function?
Update 2
The hex encoding of 'version
is 2700 5600 6500 7200 7300 6900 6f00 6e00
.
What encoding is that?
Upvotes: 2
Views: 1327
Reputation: 7287
I created a file with the hex contents you provided and managed to find a solution:
<?php
$text = file_get_contents(__DIR__.'/test');
$text = mb_convert_encoding($text, 'UTF-8', 'UTF-16LE');
var_dump(strpos($text, "r")); // int(3)
var_dump(strpos($text, "Version")); // int(1)
Contents of test
(viewed in Hex Fiend):
Version of PHP used: PHP 5.6.36
Upvotes: 1
Reputation: 42507
Even if you're using mb_strpos
, you'd need to make sure $needle
and $haystack
are the same encoding anyway.
I'd suggest you use UTF-8 as much and as soon as possible, which means that I'd convert the UTF-16LE content to UTF-8 using iconv:
$text = file_get_contents('test.txt'); // contains 'version in UTF-16LE
var_dump(strpos($text, 'r')); // 6
var_dump(strpos($text, 'version')); // false
$text = iconv('UTF-16LE', 'UTF-8', $text);
var_dump(strpos($text, 'r')); // 3
var_dump(strpos($text, 'version')); // 1
Remember to do a strict !== false
check (not null, as you mention in your post) as the file contents may start with the string version
, in which case strpos would return 0
.
Upvotes: 2