split unicode string by length PHP

Question

I need to split Unicode code string into array by 70 characters. So the values in the result array will contain 70 characters long strings. The following is my code

$msg = preg_replace('/[
]+/', ' ', $smsContent);
$chunks = wordwrap($msg, 70, '
');
$chunks = explode('
', $chunks); 
//print_r($chunks);

But the result array contains value with different length.
Here is an example

$smsContent = "सभी मनुष्यों कोगौरव और अधिकारों के मामले में जनजात स्वतंत्रता और समानता प्राप्त है | उन्हें बुद्धि और अन्तरात्मा कि देन प्राप्त है |";

result :

Array
(
    [0] => सभी मनुष्यों कोगौरव और अधि
    [1] => कारों के मामले में जनजात स�
    [2] => �वतंत्रता और समानता प्राप्
    [3] => त है | उन्हें बुद्धि और अन्त
    [4] => रात्मा कि देन प्राप्त है |

)

I need to split it into 70 characters long values, but it seems to be not correct. And also I need to prevent words from splitting.

Casimir et Hippolyte · Accepted Answer

You can't use an approach based on the number of bytes because your string contains multibyte characters and eventually combining characters. You have to work by glyph. It's possible to do that using the character classes [:graph:] and [:print:]:

preg_match_all('~[[:graph:]][[:print:]]{0,30}(?!\S)~u', $smsContent, $m);
print_r($m[0]);

demo

You can also try to play with the grapheme functions from intl.

split unicode string by length PHP

Answers (2)

Related Questions