CODEWITHSUNDEEP

c#windowsunicode

Kelson Ball

Reputation: 983

C# Mapping ALT+<XXX> codes to characters

I am trying to create a script to create a table of alt codes. For example on windows the key combo ALT+227 prints the character π, ALT+7899 prints █, and so on.

I have tried a variety of ways of accessing that data in C# but every method seems to come up with a different value. Casting 227 to a Character does not yield π, nor does using the System.Text encoding options.

How can I correctly map alt codes to characters?

Upvotes: 1

Views: 1488

Answers (1)

Reputation: 2373

All this Alt-XXX thing is a leftover from pre-Unicode days. In the days of MS-DOS, character codes were 8-bit, and this combination entered character number XXX. PC users in the US, Canada and Western Europe used the original character set of IBM PC, which was later called codepage 437. PC users in other countries used other codepages (for example, ex-USSR-Cyrillic codepage 866 or Central European codepage 852). Text-mode screens made it necessary to have pseudographic codepoints.

When Windows was designed, Microsoft decided to introduce its own set of codepages, with a more consistent positioning of characters; also, Windows ran in graphic mode, so pseudographic codepoints were unnecessary, leaving room for more character glyphs. However, at that time, characters were still mostly 8-bit (because memory was precious), Unicode was only a bright prospect for the future, and robust encoding methods such as UTF-8 did not exist. So, different countries still needed different codepages. Windows heavily relied on MS-DOS and allowed to run MS-DOS programs; also, many users had got used to Alt+XXX trick. Therefore, a Windows configuration now included two codepages: the so-called OEM (original equipment manufacturer) codepage for MS-DOS applications (and, later, Windows console applications), and the so-called ANSI codepage for Windows applications (because Microsoft managed to have them standardized by the ANSI). For example, a computer in US would typically be configured to OEM codepage 437 and ANSI codepage 1252; a computer in Russia, to OEM 866 and ANSI 1251. To spare users from too much thinking about codepages, Windows, when handling familiar Alt+XXX keystrokes in Windows applications, automatically remapped them from OEM to ANSI. Additionally, Alt+0XXX was introduced for those who wanted to take advantage from new ANSI characters.

You might be amazed to find out that all that still holds! Even though we now have Unicode, UTF-8 and other nice things, our systems still have a notion of two codepages, used for compatibility with applications that still use 8-bit character codes, OEM for console and ANSI for graphic mode. If you check codepage 437 you'll find that it contains π in position 227. And, if codepage 437 is configured as your OEM codepage, Alt+227 enters π. On my computer, Alt+227 enters у (Cyrillic) because my computer is configured to OEM codepage 866.

Now, why does Alt+7899 enter █? Original IBM PC keyboard interrupt handler did not handle numeric overflow, so, when you entered Alt+7899, it honestly computed (char)(((7 * 10 + 8) * 10 + 9) * 10 + 9) (NB: char was 8 bit!) which gave 219. Codepoint 219 in codepage 437 is █. This overflow logic is still kept in modern Windows systems.

This story is incomplete because in Far Eastern languages, which don't fit into a single byte, all this may be wrong.

Now, what's the answer to your question? You need to map codepoint XXX in your system's OEM codepage to Unicode. System.Text.Encoding.GetEncoding(System.Globalization.CultureInfo.CurrentCulture.TextInfo.OEMCodePage) (or, in console applications also System.Console.OutputEncoding) returns the instance of System.Text.Encoding class for the OEM codepage of your computer. Then, GetChars method converts OEM codepoints to Unicode codepoints.

Upvotes: 4

Related Questions