Reputation: 699
I'm trying to process an array of tweets using array_walk
encode the text into UTF8 so that any chinese characters are handled properly.
array_walk($tweet_data, function(&$tweet, $key) {
$tweet['text'] = iconv('Windows-1250', 'UTF-8', $tweet['text']);
});
When I do this, I get the error "Detected an illegal character in input string"
I've also tried this using utf8_encode
.
array_walk($tweet_data, function(&$tweet, $key) {
$tweet['text'] = utf8_encode($tweet['text']);
});
And this passes through without any issue, but when the text is then displayed on the page, the characters are all wrong.
How can I properly handle UTF8 characters before passing into json_encode so it doesn't break?
Upvotes: 1
Views: 2586
Reputation: 146460
Windows-1250 cannot encode Chinese:
Windows-1250 is a code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use Latin script, such as Polish, Czech, Slovak, Hungarian, Slovene, Bosnian, Croatian, Serbian (Latin script), Romanian (before 1993 spelling reform) and Albanian. It may also be used with the German language
Neither can ISO-8859-1:
is generally intended for Western European languages (see below for a list).
I think you are trying to convert from A to B and you don't know what A is. If you're fully sure is isn't UTF-8 already, you should at least try an encoding that's specifically designed to hold that lang.
Upvotes: 1
Reputation: 6114
This simple php function converts recursively all values of an array to UTF8. The function mb_detect_encoding (line 4) checks if the value already is in UTF8, this way it will not reconvert.
function utf8_converter($array)
{
array_walk_recursive($array, function(&$item, $key){
if(!mb_detect_encoding($item, 'utf-8', true)){
$item = utf8_encode($item);
}
});
return $array;
}
Upvotes: 3