geeth
geeth

Reputation: 714

split unicode string by length PHP

I need to split Unicode code string into array by 70 characters. So the values in the result array will contain 70 characters long strings. The following is my code

$msg = preg_replace('/[\r\n]+/', ' ', $smsContent);
$chunks = wordwrap($msg, 70, '\n');
$chunks = explode('\n', $chunks); 
//print_r($chunks); 

But the result array contains value with different length.
Here is an example

$smsContent = "सभी मनुष्यों कोगौरव और अधिकारों के मामले में जनजात स्वतंत्रता और समानता प्राप्त है | उन्हें बुद्धि और अन्तरात्मा कि देन प्राप्त है |";

result :

Array
(
    [0] => सभी मनुष्यों कोगौरव और अधि
    [1] => कारों के मामले में जनजात स�
    [2] => �वतंत्रता और समानता प्राप्
    [3] => त है | उन्हें बुद्धि और अन्त
    [4] => रात्मा कि देन प्राप्त है |

)

I need to split it into 70 characters long values, but it seems to be not correct. And also I need to prevent words from splitting.

Upvotes: 0

Views: 186

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

You can't use an approach based on the number of bytes because your string contains multibyte characters and eventually combining characters. You have to work by glyph. It's possible to do that using the character classes [:graph:] and [:print:]:

preg_match_all('~[[:graph:]][[:print:]]{0,30}(?!\S)~u', $smsContent, $m);
print_r($m[0]);

demo

You can also try to play with the grapheme functions from intl.

Upvotes: 1

prakash tank
prakash tank

Reputation: 1267

You have to use str_split() function :

$smsContent = "सभी मनुष्यों कोगौरव और अधिकारों के मामले में जनजात स्वतंत्रता और समानता प्राप्त है | उन्हें बुद्धि और अन्तरात्मा कि देन प्राप्त है |";
$output = str_split($smsContent, 70);
print_r($output);

Upvotes: -1

Related Questions