dodo254
dodo254

Reputation: 519

Split string into an array of letters - double character letters PHP

I need to split a string into an array of letters. The problem is that in my language (Croatian) there are double character letters aswell (e.g. lj, nj, dž).

So the string such as ljubičicajecvijet should be split into an array that would look like this:

Array
(
    [0] => lj
    [1] => u
    [2] => b
    [3] => i
    [4] => č
    [5] => i
    [6] => c
    [7] => a
    [8] => j
    [9] => e
    [10] => c
    [11] => v
    [12] => i
    [13] => j
    [14] => e
    [15] => t
)

Here is the list of Croatian characters in an array (I included English letters aswell).

$alphabet= array(
            'a', 'b', 'c',
            'č', 'ć', 'd',
            'dž', 'đ', 'e',
            'f', 'g', 'h',
            'i', 'j', 'k',
            'l', 'lj', 'm',
            'n', 'nj', 'o',
            'p', 'q', 'r',
            's', 'š', 't',
            'u', 'v', 'w',
            'x', 'y', 'z', 'ž'
          );

Upvotes: 3

Views: 731

Answers (2)

junkfoodjunkie
junkfoodjunkie

Reputation: 3178

Or you can use this to make sure every double is checked to match, and if it does (you could reduce the $alphabet-array to just match those double characters in my solution:

<?php

ini_set('display_errors',1); // this should be commented out in production environments
error_reporting(E_ALL); // this should be commented out in production environments


$string = 'ljubičicajecvijet';

$alphabet= [
            'a', 'b', 'c',
            'č', 'ć', 'd',
            'dž', 'đ', 'e',
            'f', 'g', 'h',
            'i', 'j', 'k',
            'l', 'lj', 'm',
            'n', 'nj', 'o',
            'p', 'q', 'r',
            's', 'š', 't',
            'u', 'v', 'w',
            'x', 'y', 'z', 'ž'
          ];

function str_split_unicode($str, $length = 1) {
    $tmp = preg_split('~~u', $str, -1, PREG_SPLIT_NO_EMPTY);
    if ($length > 1) {
        $chunks = array_chunk($tmp, $length);
        foreach ($chunks as $i => $chunk) {
            $chunks[$i] = join('', (array) $chunk);
        }
        $tmp = $chunks;
    }
    return $tmp;
}

$new_array = str_split_unicode($string,2);

foreach ($new_array as $key => $value) {
    if (strlen($value) == 2) {
        if (in_array($value, $alphabet)) {
            $test[$key] = $value;
            unset($new_array[$key]);
        }
    }
}

$new_array = str_split_unicode(join('',$new_array)); 

foreach ($test as $key => $value) {
    array_splice($new_array, $key, 0, $value);  
}

print_r($new_array);

?>

Upvotes: 1

Georges O.
Georges O.

Reputation: 992

You can use this kind of solution:

Data:

$text = 'ljubičicajecviježdžt';

$alphabet = [
            'a', 'b', 'c',
            'č', 'ć', 'd',
            'dž', 'đ', 'e',
            'f', 'g', 'h',
            'i', 'j', 'k',
            'l', 'lj', 'm',
            'n', 'nj', 'o',
            'p', 'q', 'r',
            's', 'š', 't',
            'u', 'v', 'w',
            'x', 'y', 'z', 'ž'
];

1. Order results by length in order to have the double letters at the beginning

// 2 letters first
usort($alphabet, function($a, $b) {
    if( mb_strlen($a) != mb_strlen($b) )
        return mb_strlen($a) < mb_strlen($b);
    else
        return $a > $b;
});

var_dump($alphabet);

2. Finally, split. I used preg_split function with preg_quote to protect the function.

// split
$alphabet = array_map('preg_quote', $alphabet); // protect preg_split
$pattern = implode('|', $alphabet); // 'dž|lj|nj|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|ć|č|đ|š|ž'

var_dump($pattern);

var_dump( preg_split('`(' . $pattern . ')`si', $text, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY) );

And the result :)

array (size=18)
  0 => string 'lj' (length=2)
  1 => string 'u' (length=1)
  2 => string 'b' (length=1)
  3 => string 'i' (length=1)
  4 => string 'č' (length=2)
  5 => string 'i' (length=1)
  6 => string 'c' (length=1)
  7 => string 'a' (length=1)
  8 => string 'j' (length=1)
  9 => string 'e' (length=1)
  10 => string 'c' (length=1)
  11 => string 'v' (length=1)
  12 => string 'i' (length=1)
  13 => string 'j' (length=1)
  14 => string 'e' (length=1)
  15 => string 'ž' (length=2)
  16 => string 'dž' (length=3)
  17 => string 't' (length=1)

Upvotes: 1

Related Questions