Reputation: 101
i'm trying to get hebrew input via GET method and split it into an array, though the page is encoded, I stil get results like this: Array ( [0] => � [1] => � [2] => � [3] => � [4] => � [5] => � [6] => � [7] => � ) (The word is מילה)
Here is my code, what am I doing wrong?
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<?php
$word = $_GET['word'];
$arr = str_split($word);
print_r($arr);
?>
</body>
</html>
Upvotes: 0
Views: 784
Reputation: 33
Don't have enough reputation to add comment, so an answer instead:
there is a problem using strlen
with hebrew and I guess other multibyte characters.
strlen('מילה') //equals 8 when in reality its 4 letters
mb_strlen('מילה') //also equals 8
better use:
mb_strlen('מילה', "UTF-8") //equals 4 as it should
So taking Johannes Kling's answer with this into an account we get:
function splitMultiByte($string) {
$output = array();
for ($i = 0; $i < mb_strlen($string, "UTF-8") ; $i++) {
$output[] = mb_substr($string,$i,1,'UTF-8');
}
return $output;
}
mb_strlen
uses "internal character encoding" by default, so if its not UTF-8 the count will be wrong. So setting UTF-8 explicitly is safest option imho.
Upvotes: 0
Reputation: 38502
This may work for you.
<?php
function mb_str_split( $string ) {
# Split at all position not after the start: ^
# and not before the end: $
return preg_split('/(?<!^)(?!$)/u', $string );
}
$string = 'מילה';
$charlist = mb_str_split( $string );
print_r( $charlist );
?>
Another way,
function mbStrToArray ($string) {
$strlen = mb_strlen($string);
while ($strlen) {
$array[] = mb_substr($string,0,1,"UTF-8");
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}
$result=mbStrToArray('מילה');
print '<pre>';
print_r($result);
Upvotes: 0
Reputation: 101
function splitMultiByte($string) {
$output = array();
for ($i = 0; $i < strlen($string); $i++) {
$output[] = mb_substr($string,$i,1,'UTF-8');
}
return $output;
}
Well I think what causes the problem here is, that hebrew letters are not supported in ASCII and you therefore need to work with PHP functions that are prefixed with mb. They'll work with so called multibyte (letters that are represented by more than one byte) values.
You can use the function above. It should give you an array as expected.
Upvotes: 3