Reputation: 1842
I need to replace special characters inside a string with other characters. For example a "ä" can be replaced by either "a" or "ae" and a "à" with "a" as well. Normally this is pretty easy to do with PHP and there are lots of functions on stackoverflow, which already do excactly that.
Unfortunately my string looks like this: "u\u0308 a\u0302 a\u0308 o\u0300.zip" (ü â ä ò.zip). As you might see my strings are file names and OSX seems to convert the characters to unicode (at least that is what i think).
I know that i could use a very long array with all special characters to replace them in PHP:
$str = "u\u0308 a\u0302 a\u0308 o\u0300.zip";
$ch = array("u\u0308", "a\u0302", "a\u0308", "o\u0300");
$chReplace = = array("u", "a", "a", "o");
str_replace($ch, $chReplace, $str);
But I'm wondering if there is an easier way, so I don't have to do this manually for every character?
Upvotes: 1
Views: 10692
Reputation: 76646
You can solve this problem by dividing it into multiple steps:
Convert the Unicode code points to actual entities. This can be easily achieved using preg_replace()
. For an explanation of how the regex works, see my answer here.
Now you will have a set of characters like ü
. These are HTML entities. To convert them into their corresponding character forms, use html_entity_decode()
.
You will now have a UTF-8 string. You need to convert it into ISO-8859-1 (Official ISO 8-bit Latin-1). The //TRANSLIT
part is to enable transileration. If this is enabled, when a character can't be represented in the target charset, it will try to approximate the result.
Code:
// Set the locale to something that's UTF-8 capable
setlocale(LC_ALL, 'en_US.UTF-8');
$str = "u\u0308 a\u0302 a\u0308 o\u0300";
// Convert the codepoints to entities
$str = preg_replace("/\\\\u([0-9a-fA-F]{4})/", "&#x\\1;", $str);
// Convert the entities to a UTF-8 string
$str = html_entity_decode($str, ENT_QUOTES, 'UTF-8');
// Convert the UTF-8 string to an ISO-8859-1 string
echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
Output:
u a a o
Upvotes: 3