Michael Luger
Michael Luger

Reputation: 73

How do I use subscript digits in my C++ program?

I'm currently writing a C++ program that's rather math-involved. As such, I'm trying to denote some objects as having subscript numbers in a wstring member variable of their class. However, attempts at storing these characters in any capacity forces them into their non-subscript counterparts. By contrast, direct uses of the characters that are pasted in the code are maintained as desired. Here are several cases I experimented with:

setlocale(LC_ALL, "");
wchar_t txt = L'\u2080';
wcout << txt << endl;
myfile << txt << endl;

This outputs "0" to both the file and console.

setlocale(LC_ALL, "");
wcout << L"x₀₁" << endl;
myfile << L"x₀₁" << endl;

This outputs "x01" to both the file and console.

setlocale(LC_ALL, "");
wcout << "x₀₁" << endl;
myfile << "x₀₁" << endl;

This outputs "xâ'?â'?" to the console, which I'd like to avoid if possible, and "x₀₁" to the file which is what I want. An ideal program state would be one that property outputs to both the file and to console, but if that’s not possible, then printing non-subscript characters to the console is preferable.

My code intends to convert ints into their corresponding subscripts. How do I manipulate these characters as smoothly as possible without them getting converted back? I suspect that character encoding plays a part, but I do not know how to incorporate Unicode encoding into my program.

Upvotes: 7

Views: 1833

Answers (2)

Ted Lyngmo
Ted Lyngmo

Reputation: 117643

I find these things tricky and I'm never sure if it works for everyone on every Windows version and locale, but this does the trick for me:

#include <Windows.h>
#include <io.h>     // _setmode
#include <fcntl.h>  // _O_U16TEXT

#include <clocale>  // std::setlocale 
#include <iostream>

// Unicode UTF-16, little endian byte order (BMP of ISO 10646)
constexpr char CP_UTF_16LE[] = ".1200";

constexpr wchar_t superscript(int v) {
    constexpr wchar_t offset = 0x2070;       // superscript zero as offset
    if (v == 1) return 0x00B9;               // special case
    if (v == 2 || v == 3) return 0x00B0 + v; // special case 2
    return offset + v;
}

constexpr wchar_t subscript(int v) {
    constexpr wchar_t offset = 0x2080; // subscript zero as offset
    return offset + v;
}

int main() {
    // set these before doing any other output:
    setlocale(LC_ALL, CP_UTF_16LE);
    _setmode(_fileno(stdout), _O_U16TEXT);

    // subscript
    for (int i = 0; i < 10; ++i)
        std::wcout << L'X' << subscript(i) << L' ';
    std::wcout << L'\n';

    // superscript
    for (int i = 0; i < 10; ++i)
        std::wcout << L'X' << superscript(i) << L' ';
    std::wcout << L'\n';    
}

Output:

X₀ X₁ X₂ X₃ X₄ X₅ X₆ X₇ X₈ X₉
X⁰ X¹ X² X³ X⁴ X⁵ X⁶ X⁷ X⁸ X⁹

A more convenient way may be to create wstrings directly. Here wsup and wsub takes a wstring and returns a converted wstring. Characters they can't handle are left unchanged.

#include <Windows.h>
#include <io.h>      // _setmode
#include <fcntl.h>   // _O_U16TEXT

#include <algorithm> // std::transform
#include <clocale>   // std::setlocale 
#include <iostream>

// Unicode UTF-16, little endian byte order (BMP of ISO 10646)
constexpr char CP_UTF_16LE[] = ".1200";

std::wstring wsup(const std::wstring& in) {
    std::wstring rv = in;

    std::transform(rv.begin(), rv.end(), rv.begin(),
        [](wchar_t ch) -> wchar_t {
            // 1, 2 and 3 can be put in any order you like
            // as long as you keep them in the top section
            if (ch == L'1') return 0x00B9;
            if (ch == L'2') return 0x00B2;
            if (ch == L'3') return 0x00B3;

            // ...but this must be here in the middle:
            if (ch >= '0' && ch <= '9') return 0x2070 + (ch - L'0');

            // put the below in any order you like,
            // in the bottom section
            if (ch == L'i') return 0x2071;
            if (ch == L'+') return 0x207A;
            if (ch == L'-') return 0x207B;
            if (ch == L'=') return 0x207C;
            if (ch == L'(') return 0x207D;
            if (ch == L')') return 0x207E;
            if (ch == L'n') return 0x207F;

            return ch; // no change
        });
    return rv;
}

std::wstring wsub(const std::wstring& in) {
    std::wstring rv = in;

    std::transform(rv.begin(), rv.end(), rv.begin(),
        [](wchar_t ch) -> wchar_t {
            if (ch >= '0' && ch <= '9') return 0x2080 + (ch - L'0');
            if (ch == L'+') return 0x208A;
            if (ch == L'-') return 0x208B;
            if (ch == L'=') return 0x208C;
            if (ch == L'(') return 0x208D;
            if (ch == L')') return 0x208E;
            if (ch == L'a') return 0x2090;
            if (ch == L'e') return 0x2091;
            if (ch == L'o') return 0x2092;
            if (ch == L'x') return 0x2093;
            if (ch == 0x0259) return 0x2094; // small letter schwa: ə
            if (ch == L'h') return 0x2095;
            if (ch >= 'k' && ch <= 'n') return 0x2096 + (ch - 'k');
            if (ch == L'p') return 0x209A;
            if (ch == L's') return 0x209B;
            if (ch == L't') return 0x209C;

            return ch; // no change
        });
    return rv;
}

int main() {
    std::setlocale(LC_ALL, CP_UTF_16LE);
    if (_setmode(_fileno(stdout), _O_U16TEXT) == -1) return 1;

    auto pstr = wsup(L"0123456789 +-=() ni");
    auto bstr = wsub(L"0123456789 +-=() aeoxə hklmnpst");

    std::wcout << L"superscript:   " << pstr << L'\n';
    std::wcout << L"subscript:     " << bstr << L'\n';

    std::wcout << L"an expression: x" << wsup(L"(n-1)") << L'\n';
}

Output:

superscript:   ⁰¹²³⁴⁵⁶⁷⁸⁹ ⁺⁻⁼⁽⁾ ⁿⁱ
subscript:     ₀₁₂₃₄₅₆₇₈₉ ₊₋₌₍₎ ₐₑₒₓₔ ₕₖₗₘₙₚₛₜ
an expression: x⁽ⁿ⁻¹⁾

My console didn't manage to display the subscript versions of hklmnpst - but apparently the transformation was correct because it shows up here ok after copy/pasting.

Upvotes: 2

idris
idris

Reputation: 588

You should configure console and the program you open the file that it should interpret your string as its encoding (eg. utf32).

for example in windows you can set your console code page with SetConsoleOutputCP function. to view file different encoding you can add your file to vs solution, right click/open with / source code (text) with encoding than select your encoding.

Upvotes: 0

Related Questions