Daniel Näslund
Daniel Näslund

Reputation: 2410

How make QChar.unicode() report the utf-16 representation of combined characters?

I'm trying to write a codec for Code page 437. My plan was to just pass the ASCII characters through and map the remaining 128 characters in a table, using the utf-16 value as key.

For some combined charaters (letters with dots, tildes etcetera), the character appears to occupy two QChars.

A test program that prints the utf-16 values for the arguments to the program:

#include <iostream>
#include <QString>

using namespace std;

void print(QString qs)
{
    for (QString::iterator it = qs.begin(); it != qs.end(); ++it)
        cout << hex << it->unicode() << " ";
    cout << "\n";
}

int main(int argc, char *argv[])
{
    for (int i = 1; i < argc; i++)
        print(QString::fromStdString(argv[i]));
}

Some output:

$ ./utf16 Ç ü é
c3 87 
c3 bc 
c3 a9 

I had expected

c387
c3bc
c3a9

Tried the various normalizationsforms avaialable in QString but no one had fewer bytes than the default.

Since QChar is 2 bytes it should be able to hold the value of the characters above in one object. Why does the QString use two QChars? How can I fetch the combined unicode value?

Upvotes: 2

Views: 3521

Answers (2)

MSalters
MSalters

Reputation: 179779

Just sidestep the problem. See QApplication in Unicode. QApplication::arguments is already UTF-16 encoded for you taking local conventions into account.

Upvotes: 0

  1. QString::fromStdString expects an ASCII string and doesn't do any decoding. Use fromLocal8Bit instead.

  2. Your expected output is wrong. For example, Ç is U+00C7, so you should expect C7, not the UTF-8 encoding of C3 87!

If you modify main() as below, you get the expected Unicode code points. For each character, the first line lists the local encoding (here: Utf-8), since fromStdString is essentially a no-op and passes everything straight. The second line lists the correctly decoded Unicode code point index.

$ ./utf16 Ç ü é
c3 87 
c7 
c3 bc 
fc 
c3 a9 
e9 
int main(int argc, char *argv[])
{
    for (int i = 1; i < argc; i++) {
        print(QString::fromStdString(argv[i]));
        print(QString::fromLocal8Bit(argv[i]));
    }
}

Upvotes: 3

Related Questions