Reputation: 129
I am strangling with a perl script I made which is supposed to handle IPA characters (International Phonetic Alphabet). I worked with UTF8 encoding, for my perl file and the std in/out as follows:
#!/usr/local/bin/perl
use utf8;
binmode(STDOUT, ":utf8"); #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8
However when I run this little test:
my %IPAchar = (
"69" => "i", "65" => "e", "25b" => "ɛ", "" => "ɛ̃",
"" => "œ̃", "153" => "œ", "259" => "ə", "f8" => "ø",
"79" => "y", "75" => "u", "6f" => "o", "254" => "ɔ",
"" => "ɔ̃", "e3" => "ɑ̃", "251" => "ɑ", "61" => "a",
"6a" => "j", "265" => "ɥ", "77" => "w", "6e" => "n",
"272" => "ɲ", "14b" => "ŋ", "261" => "ɡ", "6b" => "k",
"6d" => "m", "62" => "b", "70" => "p", "76" => "v",
"66" => "f", "64" => "d", "74" => "t", "292" => "ʒ",
"283" => "ʃ", "7a" => "z", "73" => "s", "281" => "ʁ",
"6c" => "l", "" => "h", "294" => "ʔ", "2e" => ".",
"280" => "ʀ", "1dd" => "ǝ", "72" => "r", "3b5" => "ε",
"67" => "g", "25c" => "ɜ", "2d0" => "ː", "2c8" => "ˈ",
"2b0" => "ʰ", "26a" => "ɪ"
);
foreach my $k ( sort keys(%IPAchar) ) {
print "\n[$k] /$IPAchar{$k}/";
}
all characters are not printed properly. This is weird since characters "ä" or "ø" or "ε" appear properly, but I cannot manage to make the other specific characters working e.g "ʃ","ɜ",....
If anyone could help I would actually really appreciate!!!
Thanks for reading,
Simon
Upvotes: 2
Views: 138
Reputation: 129
I just confirm it works fine. Here is the way to set up the GNU Unicode on Cygwin:
If you haven't already, install the X11 that comes with Cygwin. See Cygwin/X User's Guide http://x.cygwin.com/docs/ug/cygwin-x-ug.html for details. When selecting additional X11 utilities, be sure to add mkfontdir and xset from category X11.
Decide for a directory to place the GNU unifont. I chose ~/X11/font for the following.
cp unifont.pcf.gz ~/X11/font/unifont.pcf.gz
mkfontdir ~/X11/font
If not already running, start an X server, e.g. with startxwin
export DISPLAY=:0
xset +fp ~/X11/font
xterm -fn '-gnu-unifont-medium-r-normal--16-160-75-75-c-80-iso10646-1'
Upvotes: 0
Reputation: 126722
Are you looking at the output of your program on a console or in an editor?
Even if your program is generating the correct character codes for the symbols you want, you have to be using a font that supports those symbols to display the text; otherwise the display won't make sense.
It can be useful to open a text file using your browser, as web browsers have to accommodate pretty much any official encoding, and will usually be able to render the contents of your file correctly.
A quick search found this list of fonts that support the IP symbols. If you use one of those then you should be able to see your output properly.
I highly recommend GNU Unifont
, which has the best coverage of the Unicode character set of any font that I know. It's a sans-serif font.
Update
It worries me that your your definition of the %IPAchar
hash has multiple keys in set to the null or empty string ""
. It's a perfectly valid hash key, but the nature of hashes means that you can have only one element with that as a key. Officially, the value of the hash element $IPAchar{''}
is undefined in this situation. In practice it will be set to the last value in the list that has the same key, so in your case $IPAchar{''} = 'h'
.
Upvotes: 6