Reputation: 143
How can I use Unicode symbols in Turbo C++?
I particularly want to superscript and subscript symbols.
I must use the outdated Turbo C++ as this is what my school provides and I have to use this for my project.
Upvotes: 1
Views: 635
Reputation: 51873
Old Turbo C++ (not the BDS2006 Turbo C++) is for MS-DOS 16-bit targets, so it does not support Unicode at all, nor TTF fonts, etc. So in order to get Unicode to work, you have two options:
Implement Unicode on your own
So you need to render Unicode text. In graphics mode it is simple. Just use a "complete" Unicode font, like:
IIRC It's raster so it's easy to decode and render (you can even create a big bitmap from it in a newer OS and use that in MS-DOS as a font). For rendering in graphics mode, you can use direct pixel access (VGA/VESA). In text modes it's much harder as you need to update the EGA/VGA font with the characters you are actually using. But the count of distinct actually rendered characters on screen is limited to 256 per font. For more information, see:
Of course, supporting the whole Unicode is a problem under MS-DOS as the full font even in small resolution is usually 12-64 MB big so you need to have enough XMS memory (as you do not fit 640 KB nor 1 MB model any more) and implement fast paging/access to used characters in order to be usable fast... Another option is to use 32-bit protected mode where you got 32-bit linear memory access (but then you have no more MS-DOS support and need to do all the OS stuff on your own, but extenders like DOS4GW can do some of it for you...)
You could use one shortcut avoiding memory management. You could create a RAM disk and store your raw font image as file in the RAM disk. File access to it should be fast (well faster than accessing the HDD) ... so at appplication start, copy the font from the HDD into the RAM disk location and then just use that... thanks to this, XMS is not needed any more.
Transform Unicode strings into extended ASCII
With characters supporting your special characters outside the standard ASCII table. There are MS-DOS utilities for this (supporting latin1, 2, kamenicky,...) which provide the extended font and keyboard handling (within selected codepage).
So you would need to have a conversion table for all characters you want to support and map between UTF-8, UTF-16 and your ASCII + extended characters. However, this way it can support only 128 extended characters.
Upvotes: 2
Reputation: 110516
As stated, Turbo C++ won't get you any straight access to Unicode. It is likely that it is so old that it can't even generate code that could be made to use the system's libraries (DLL), so - even by recreating header files by hand, you could not call wprintf
which could output proper Unicode even on the arcane cmd
terminal Microsoft ships with Windows to this day.
However, the default character encoding used in the cmd terminal supports some non-ASCII characters - which exactly will depend on the language (locale) configuration of your OS. (For example, for Western European languages, it is usually "cp-852" - although it can be CP 850, if your Windows is in English.
None of these legacy 8-bit character map encodings will include all ten digits as super-script - but you might have some available (CP 850 features "¹,²,³", for example).
So, you could check the terminal code page, and check on Wikipedia for their codes - you can inspect and change the current code page with the chcp
command in the Windows terminal. If your Windows version supports UTF-8, which covers all printable Unicode characters, you have to type chcp 65001
in the terminal. (I don't know which Windows editions support that, nor which you are using.)
Once you manage to do that, all you need is to print the byte-sequences for the super-script digits in UTF-8, using the "\xHH" encoding for characters in a string (I am not sure if Turbo C++ will allow it. Otherwise, `printf ("%c%c", 0xHH, 0xHH) will work.)
For your convenience, I am attaching the codepoints and UTF-8 encodings for superscripts:
0x00B2: SUPERSCRIPT TWO - ² - utf-8 seq: b'\xc2\xb2'
0x00B3: SUPERSCRIPT THREE - ³ - utf-8 seq: b'\xc2\xb3'
0x00B9: SUPERSCRIPT ONE - ¹ - utf-8 seq: b'\xc2\xb9'
0x0670: ARABIC LETTER SUPERSCRIPT ALEF - ٰ - utf-8 seq: b'\xd9\xb0'
0x0711: SYRIAC LETTER SUPERSCRIPT ALAPH - ܑ - utf-8 seq: b'\xdc\x91'
0x2070: SUPERSCRIPT ZERO - ⁰ - utf-8 seq: b'\xe2\x81\xb0'
0x2071: SUPERSCRIPT LATIN SMALL LETTER I - ⁱ - utf-8 seq: b'\xe2\x81\xb1'
0x2074: SUPERSCRIPT FOUR - ⁴ - utf-8 seq: b'\xe2\x81\xb4'
0x2075: SUPERSCRIPT FIVE - ⁵ - utf-8 seq: b'\xe2\x81\xb5'
0x2076: SUPERSCRIPT SIX - ⁶ - utf-8 seq: b'\xe2\x81\xb6'
0x2077: SUPERSCRIPT SEVEN - ⁷ - utf-8 seq: b'\xe2\x81\xb7'
0x2078: SUPERSCRIPT EIGHT - ⁸ - utf-8 seq: b'\xe2\x81\xb8'
0x2079: SUPERSCRIPT NINE - ⁹ - utf-8 seq: b'\xe2\x81\xb9'
0x207A: SUPERSCRIPT PLUS SIGN - ⁺ - utf-8 seq: b'\xe2\x81\xba'
0x207B: SUPERSCRIPT MINUS - ⁻ - utf-8 seq: b'\xe2\x81\xbb'
0x207C: SUPERSCRIPT EQUALS SIGN - ⁼ - utf-8 seq: b'\xe2\x81\xbc'
0x207D: SUPERSCRIPT LEFT PARENTHESIS - ⁽ - utf-8 seq: b'\xe2\x81\xbd'
0x207E: SUPERSCRIPT RIGHT PARENTHESIS - ⁾ - utf-8 seq: b'\xe2\x81\xbe'
0x207F: SUPERSCRIPT LATIN SMALL LETTER N - ⁿ - utf-8 seq: b'\xe2\x81\xbf'
0xFC5B: ARABIC LIGATURE THAL WITH SUPERSCRIPT ALEF ISOLATED FORM - ﱛ - utf-8 seq: b'\xef\xb1\x9b'
0xFC5C: ARABIC LIGATURE REH WITH SUPERSCRIPT ALEF ISOLATED FORM - ﱜ - utf-8 seq: b'\xef\xb1\x9c'
0xFC5D: ARABIC LIGATURE ALEF MAKSURA WITH SUPERSCRIPT ALEF ISOLATED FORM - ﱝ - utf-8 seq: b'\xef\xb1\x9d'
0xFC63: ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM - ﱣ - utf-8 seq: b'\xef\xb1\xa3'
0xFC90: ARABIC LIGATURE ALEF MAKSURA WITH SUPERSCRIPT ALEF FINAL FORM - ﲐ - utf-8 seq: b'\xef\xb2\x90'
0xFCD9: ARABIC LIGATURE HEH WITH SUPERSCRIPT ALEF INITIAL FORM - ﳙ - utf-8 seq: b'\xef\xb3\x99'
(This was generated with the following Python snippet in interactive mode:)
import unicodedata
for i in range(0, 0x10ffff):
char = chr(i)
try:
name = unicodedata.name(char)
except ValueError:
pass
if "SUPERSCRIPT" not in name:
continue
print(f"0x{i:04X}: {name} - {char} - utf-8 seq: {char.encode('utf-8')}")
Upvotes: 2