Ofir
Ofir

Reputation: 101

Can't split hebrew word into an array PHP

i'm trying to get hebrew input via GET method and split it into an array, though the page is encoded, I stil get results like this: Array ( [0] => � [1] => � [2] => � [3] => � [4] => � [5] => � [6] => � [7] => � ) (The word is מילה)

Here is my code, what am I doing wrong?

<!DOCTYPE html>
<html>
    <head>
        <title>Test</title>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
    </head>
    <body>
        <?php
        $word = $_GET['word'];
        $arr = str_split($word);
        print_r($arr);
        ?>
    </body>
</html>

Upvotes: 0

Views: 784

Answers (3)

Dima
Dima

Reputation: 33

Don't have enough reputation to add comment, so an answer instead:

there is a problem using strlen with hebrew and I guess other multibyte characters.

strlen('מילה') //equals 8 when in reality its 4 letters
mb_strlen('מילה') //also equals 8

better use:

mb_strlen('מילה', "UTF-8") //equals 4 as it should

So taking Johannes Kling's answer with this into an account we get:

function splitMultiByte($string) {
    $output = array();
    for ($i = 0; $i < mb_strlen($string, "UTF-8") ; $i++) {
        $output[] = mb_substr($string,$i,1,'UTF-8');
    }
    return $output;
}

mb_strlen uses "internal character encoding" by default, so if its not UTF-8 the count will be wrong. So setting UTF-8 explicitly is safest option imho.

Upvotes: 0

A l w a y s S u n n y
A l w a y s S u n n y

Reputation: 38502

This may work for you.

<?php
 function mb_str_split( $string ) {
 # Split at all position not after the start: ^
 # and not before the end: $
 return preg_split('/(?<!^)(?!$)/u', $string );
 }

 $string   = 'מילה';
 $charlist = mb_str_split( $string );

 print_r( $charlist );
?>    


Another way,

function mbStrToArray ($string) {
$strlen = mb_strlen($string);
while ($strlen) {
    $array[] = mb_substr($string,0,1,"UTF-8");
    $string = mb_substr($string,1,$strlen,"UTF-8");
    $strlen = mb_strlen($string);
}
return $array;
}

 $result=mbStrToArray('מילה');
 print '<pre>';
 print_r($result);

Upvotes: 0

Johannes Kling
Johannes Kling

Reputation: 101

function splitMultiByte($string) {
  $output = array();
  for ($i = 0; $i < strlen($string); $i++) {
    $output[] = mb_substr($string,$i,1,'UTF-8');
  }
  return $output;
}

Well I think what causes the problem here is, that hebrew letters are not supported in ASCII and you therefore need to work with PHP functions that are prefixed with mb. They'll work with so called multibyte (letters that are represented by more than one byte) values.

You can use the function above. It should give you an array as expected.

Upvotes: 3

Related Questions