Reputation: 101
Is there a way to convert umlauts from the representations ae, Ae, oe, Oe, ue, Ue
and ss
, back to the original umlauts? Important is that the spelling is observed like "teuer"! For example, the term "teuer
" must not be changed in "teür
". Thanks!
Upvotes: 5
Views: 4007
Reputation: 168685
This is going to be pretty tricky to get right. There certainly isn't any built-in function to do it.
Most of the examples I've seen for this kind of thing work in the opposite direction (ie taking a string with accented characters and replacing them with their ASCII equivalents). Where I have seen it done, it's always been a case of providing a map of characters and their equivalents, and scanning the string doing replacements.
The PHP manual page for the strtr()
function has some good examples on the kind of thing you'd need to do, but your requirements to avoid specific exceptions is going to complicate the whole process enormously.
Upvotes: 0
Reputation: 34978
I suggest you convert each permutation of occurences of "ue", "oe" and so on. By each permutation I mean if there are 3 occurences first replace only the first, then only the second, then only the third, then first and second and so on.
Next, check if the results are contained in a standard spellchecking dictionary. By this you do not have to create your own dictionary for exceptions.
A wordlist can be found for example on ftp://ftp.ox.ac.uk/pub/wordlists/german/words.german.Z
Upvotes: 0
Reputation: 98750
iconv("utf-8","ascii//TRANSLIT",$input);
Extended example
OR
echo strtr(utf8_decode($input),
utf8_decode('ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ'),
'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
Refer this question.
Upvotes: 3