user2573552
user2573552

Reputation: 129

Why perl cannot show all types of UTF8 characters

I am strangling with a perl script I made which is supposed to handle IPA characters (International Phonetic Alphabet). I worked with UTF8 encoding, for my perl file and the std in/out as follows:

#!/usr/local/bin/perl
use utf8;

binmode(STDOUT, ":utf8");          #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8

However when I run this little test:

my %IPAchar = (
    "69"  => "i",    "65"  => "e",    "25b" => "ɛ",    ""    => "ɛ̃",
    ""    => "œ̃",    "153" => "œ",    "259" => "ə",    "f8"  => "ø",
    "79"  => "y",    "75"  => "u",    "6f"  => "o",    "254" => "ɔ",
    ""    => "ɔ̃",    "e3"  => "ɑ̃",    "251" => "ɑ",    "61"  => "a",
    "6a"  => "j",    "265" => "ɥ",    "77"  => "w",    "6e"  => "n",
    "272" => "ɲ",    "14b" => "ŋ",    "261" => "ɡ",    "6b"  => "k",
    "6d"  => "m",    "62"  => "b",    "70"  => "p",    "76"  => "v",
    "66"  => "f",    "64"  => "d",    "74"  => "t",    "292" => "ʒ",
    "283" => "ʃ",    "7a"  => "z",    "73"  => "s",    "281" => "ʁ",
    "6c"  => "l",    ""    => "h",    "294" => "ʔ",    "2e"  => ".",
    "280" => "ʀ",    "1dd" => "ǝ",    "72"  => "r",    "3b5" => "ε",
    "67"  => "g",    "25c" => "ɜ",    "2d0" => "ː",    "2c8" => "ˈ",
    "2b0" => "ʰ",    "26a" => "ɪ"
);

foreach my $k ( sort keys(%IPAchar) ) {
    print "\n[$k] /$IPAchar{$k}/";
}

all characters are not printed properly. This is weird since characters "ä" or "ø" or "ε" appear properly, but I cannot manage to make the other specific characters working e.g "ʃ","ɜ",....

If anyone could help I would actually really appreciate!!!

Thanks for reading,

Simon

Upvotes: 2

Views: 138

Answers (2)

user2573552
user2573552

Reputation: 129

I just confirm it works fine. Here is the way to set up the GNU Unicode on Cygwin:

If you haven't already, install the X11 that comes with Cygwin. See Cygwin/X User's Guide http://x.cygwin.com/docs/ug/cygwin-x-ug.html for details. When selecting additional X11 utilities, be sure to add mkfontdir and xset from category X11.

Decide for a directory to place the GNU unifont. I chose ~/X11/font for the following.

cp unifont.pcf.gz ~/X11/font/unifont.pcf.gz
mkfontdir ~/X11/font

If not already running, start an X server, e.g. with startxwin

export DISPLAY=:0
xset +fp ~/X11/font
xterm -fn '-gnu-unifont-medium-r-normal--16-160-75-75-c-80-iso10646-1'

Upvotes: 0

Borodin
Borodin

Reputation: 126722

Are you looking at the output of your program on a console or in an editor?

Even if your program is generating the correct character codes for the symbols you want, you have to be using a font that supports those symbols to display the text; otherwise the display won't make sense.

It can be useful to open a text file using your browser, as web browsers have to accommodate pretty much any official encoding, and will usually be able to render the contents of your file correctly.

A quick search found this list of fonts that support the IP symbols. If you use one of those then you should be able to see your output properly.

I highly recommend GNU Unifont, which has the best coverage of the Unicode character set of any font that I know. It's a sans-serif font.


Update

It worries me that your your definition of the %IPAchar hash has multiple keys in set to the null or empty string "". It's a perfectly valid hash key, but the nature of hashes means that you can have only one element with that as a key. Officially, the value of the hash element $IPAchar{''} is undefined in this situation. In practice it will be set to the last value in the list that has the same key, so in your case $IPAchar{''} = 'h'.

Upvotes: 6

Related Questions