Reputation: 519
I need to split a string into an array of letters. The problem is that in my language (Croatian) there are double character letters aswell (e.g. lj, nj, dž).
So the string such as ljubičicajecvijet
should be split into an array that would look like this:
Array
(
[0] => lj
[1] => u
[2] => b
[3] => i
[4] => č
[5] => i
[6] => c
[7] => a
[8] => j
[9] => e
[10] => c
[11] => v
[12] => i
[13] => j
[14] => e
[15] => t
)
Here is the list of Croatian characters in an array (I included English letters aswell).
$alphabet= array(
'a', 'b', 'c',
'č', 'ć', 'd',
'dž', 'đ', 'e',
'f', 'g', 'h',
'i', 'j', 'k',
'l', 'lj', 'm',
'n', 'nj', 'o',
'p', 'q', 'r',
's', 'š', 't',
'u', 'v', 'w',
'x', 'y', 'z', 'ž'
);
Upvotes: 3
Views: 731
Reputation: 3178
Or you can use this to make sure every double is checked to match, and if it does (you could reduce the $alphabet
-array to just match those double characters in my solution:
<?php
ini_set('display_errors',1); // this should be commented out in production environments
error_reporting(E_ALL); // this should be commented out in production environments
$string = 'ljubičicajecvijet';
$alphabet= [
'a', 'b', 'c',
'č', 'ć', 'd',
'dž', 'đ', 'e',
'f', 'g', 'h',
'i', 'j', 'k',
'l', 'lj', 'm',
'n', 'nj', 'o',
'p', 'q', 'r',
's', 'š', 't',
'u', 'v', 'w',
'x', 'y', 'z', 'ž'
];
function str_split_unicode($str, $length = 1) {
$tmp = preg_split('~~u', $str, -1, PREG_SPLIT_NO_EMPTY);
if ($length > 1) {
$chunks = array_chunk($tmp, $length);
foreach ($chunks as $i => $chunk) {
$chunks[$i] = join('', (array) $chunk);
}
$tmp = $chunks;
}
return $tmp;
}
$new_array = str_split_unicode($string,2);
foreach ($new_array as $key => $value) {
if (strlen($value) == 2) {
if (in_array($value, $alphabet)) {
$test[$key] = $value;
unset($new_array[$key]);
}
}
}
$new_array = str_split_unicode(join('',$new_array));
foreach ($test as $key => $value) {
array_splice($new_array, $key, 0, $value);
}
print_r($new_array);
?>
Upvotes: 1
Reputation: 992
You can use this kind of solution:
Data:
$text = 'ljubičicajecviježdžt';
$alphabet = [
'a', 'b', 'c',
'č', 'ć', 'd',
'dž', 'đ', 'e',
'f', 'g', 'h',
'i', 'j', 'k',
'l', 'lj', 'm',
'n', 'nj', 'o',
'p', 'q', 'r',
's', 'š', 't',
'u', 'v', 'w',
'x', 'y', 'z', 'ž'
];
1. Order results by length in order to have the double letters at the beginning
// 2 letters first
usort($alphabet, function($a, $b) {
if( mb_strlen($a) != mb_strlen($b) )
return mb_strlen($a) < mb_strlen($b);
else
return $a > $b;
});
var_dump($alphabet);
2. Finally, split. I used preg_split
function with preg_quote
to protect the function.
// split
$alphabet = array_map('preg_quote', $alphabet); // protect preg_split
$pattern = implode('|', $alphabet); // 'dž|lj|nj|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|ć|č|đ|š|ž'
var_dump($pattern);
var_dump( preg_split('`(' . $pattern . ')`si', $text, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY) );
And the result :)
array (size=18)
0 => string 'lj' (length=2)
1 => string 'u' (length=1)
2 => string 'b' (length=1)
3 => string 'i' (length=1)
4 => string 'č' (length=2)
5 => string 'i' (length=1)
6 => string 'c' (length=1)
7 => string 'a' (length=1)
8 => string 'j' (length=1)
9 => string 'e' (length=1)
10 => string 'c' (length=1)
11 => string 'v' (length=1)
12 => string 'i' (length=1)
13 => string 'j' (length=1)
14 => string 'e' (length=1)
15 => string 'ž' (length=2)
16 => string 'dž' (length=3)
17 => string 't' (length=1)
Upvotes: 1