tomekole
tomekole

Reputation: 401

Is it possible to convert language specific characters to latin characters in UTF8

I am wondering if there are any relationships or existing algorithms allowing converting from national characters to equivalent Latin characters within the UTF8 codepage?

For example (in Polish):

Ą -> A

Ó -> O

ż -> z

ź -> z ...

phrase like: 'zażółć gęślą jażń'

converts to: 'zazolc gesla jazn'

Currently I am using a conversion array for Polish, but I am looking for a universal solution handling all Latin based languages.

Thanks

Upvotes: 9

Views: 1379

Answers (3)

tomekole
tomekole

Reputation: 401

To make the answer complete, the 'Unicode decomposition + C#' led me to this CodeProject article (codeproject.com/KB/cs/UnicodeNormalization.aspx?display=Print) which offers a ready to use solution. The ability to name what you are looking for can't be underestimated ;) Thanks for all answers.

Upvotes: 1

Pooli
Pooli

Reputation: 503

Not completely sure that this is a definitive answer that you will need, but when I've had to do this in the past, I've converted all 'special' characters into a named or numerical entity so that they are protected during the conversion process.

Upvotes: 0

carlo.borreo
carlo.borreo

Reputation: 1355

Check this:

http://sourceforge.net/projects/iconvnet/

In general, search for something called iconv

Upvotes: 1

Related Questions