MayRiv
MayRiv

Reputation: 35

Is there transliteration from UTF-8 to CP1251 when one symbol substitutes with several symbols?

I use function iconv with option translit.

Is there transliteration from UTF-8 to CP1251 when one symbol substitutes with several symbols? Where I can search for that information? I am using iconv.

Upvotes: 1

Views: 867

Answers (2)

n. m. could be an AI
n. m. could be an AI

Reputation: 120239

The most obvious one is

$ echo 'ß' | iconv -f UTF-8 -t CP1251//TRANSLIT
ss

In addition, if your locale is German, umlauts are transliterated according to German rules (yes transliteration is locale dependent).

$ export LC_ALL=de_DE.UTF-8
$ echo 'Füße' | iconv -f utf-8 -t CP1251//TRANSLIT
Fuesse

(Some versions will print F"usse instead).

Upvotes: 0

ecatmur
ecatmur

Reputation: 157504

There are some, depending on the implementation and locale:

$ echo '℀⇒½' | iconv -f UTF8 -t CP1251//TRANSLIT
a/c=> 1/2 

These are, respectively, U+2100 ACCOUNT OF transliterated as a/c, U+21D2 RIGHTWARDS DOUBLE ARROW transliterated as =>, U+00BDVULGAR FRACTION ONE HALF transliterated as 1/2 (including spaces).

I found these in the GNU libc source code, https://github.com/lattera/glibc/blob/master/locale/C-translit.h.in; different implementations may not transliterate these characters the same way if at all.

Upvotes: 3

Related Questions