Reputation: 1081
This code:
setlocale(LC_ALL, 'pl_PL', 'pl', 'Polish_Poland.28592');
$result = mb_stripos("ĘÓĄŚŁŻŹĆŃ",'ęóąśłżźćń');
returns false;
How to fix that?
P.S. This stripos returns false when special characters is used is not correct answer.
UPDATE: I made a test:
function test() {
$search = "zawór"; $searchlen=strlen($search);
$opentag="<valve>"; $opentaglen=strlen($opentag);
$closetag="</valve>"; $closetaglen=strlen($closetag);
$test[0]['input']="test ZAWÓR test"; //normal test
$test[1]['input']="X\nX\nX ZAWÓR X\nX\nX"; //white char test
$test[2]['input']="<br> ZAWÓR <br>"; //html newline test
$test[3]['input']="ĄąĄą ZAWÓR ĄąĄą"; //polish diacritical test
$test[4]['input']="テスト ZAWÓR テスト"; //japanese katakana test
foreach ($test as $key => $val) {
$position = mb_stripos($val['input'],$search,0,'UTF-8');
if($position!=false) {
$output = $val['input'];
$output = substr_replace($output, $opentag, $position, 0);
$output = substr_replace($output, $closetag, $position+$opentaglen+$searchlen, 0);
$test[$key]['output'] = $output;
}
else {
$test[$key]['output'] = null;
}
}
return $test;
}
FIREFOX OUTPUT:
$test[0]['output'] == "test <valve>ZAWÓR</valve> test" // ok
$test[1]['output'] == "X\nX\nX <valve>ZAWÓR</valve> X\nX\nX" // ok
$test[2]['output'] == "<br> <valve>ZAWÓR</valve> <br>" // ok
$test[3]['output'] == "Ąą�<valve>�ą ZA</valve>WÓR ĄąĄą" // WTF??
$test[4]['output'] == "テ�<valve>��ト </valve>ZAWÓR テスト" // WTF??
Solution https://drupal.org/node/1107268 does not change anything.
Upvotes: 1
Views: 2428
Reputation: 1081
Solution from https://gist.github.com/stemar/8287074 :
function mb_substr_replace($string, $replacement, $start, $length=NULL) {
if (is_array($string)) {
$num = count($string);
// $replacement
$replacement = is_array($replacement) ? array_slice($replacement, 0, $num) : array_pad(array($replacement), $num, $replacement);
// $start
if (is_array($start)) {
$start = array_slice($start, 0, $num);
foreach ($start as $key => $value)
$start[$key] = is_int($value) ? $value : 0;
}
else {
$start = array_pad(array($start), $num, $start);
}
// $length
if (!isset($length)) {
$length = array_fill(0, $num, 0);
}
elseif (is_array($length)) {
$length = array_slice($length, 0, $num);
foreach ($length as $key => $value)
$length[$key] = isset($value) ? (is_int($value) ? $value : $num) : 0;
}
else {
$length = array_pad(array($length), $num, $length);
}
// Recursive call
return array_map(__FUNCTION__, $string, $replacement, $start, $length);
}
preg_match_all('/./us', (string)$string, $smatches);
preg_match_all('/./us', (string)$replacement, $rmatches);
if ($length === NULL) $length = mb_strlen($string);
array_splice($smatches[0], $start, $length, $rmatches[0]);
return join("",$smatches[0]);
}
solves the problem with function test()
Upvotes: 0
Reputation: 522175
The function works fine when told what encoding your strings are in:
var_dump(mb_stripos("ĘÓĄŚŁŻŹĆŃ",'ęóąśłżźćń', 0, 'UTF-8')); // 0
^^^^^^^
Without the explicit encoding argument, it may assume the wrong encoding and cannot treat your string correctly.
The problem with your test code is that you're mixing character-based indices with byte-offset-based indices. mb_strpos
returns offsets in characters, while substr_replace
works with byte offsets. Read about the topic here: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.
If you want to wrap a certain word in tags in a multi-byte string, I'd rather suggest this approach:
preg_replace('/zawór/iu', '<valve>$0</valve>', $text)
Note that $text
must be UTF-8 encoded, /u
regular expressions only work with UTF-8.
Upvotes: 2
Reputation: 1081
Using your tip, dear Rikesh, I wrote that:
function patched_mb_stripos($content,$search) {
$content=mb_convert_case($content, MB_CASE_LOWER, "UTF-8");
$search=mb_convert_case($search, MB_CASE_LOWER, "UTF-8");
return mb_stripos($content,$search);
}
and it seems to work :)
Upvotes: 1