Reputation: 2410
I'm trying to write a codec for Code page 437. My plan was to just pass the ASCII characters through and map the remaining 128 characters in a table, using the utf-16 value as key.
For some combined charaters (letters with dots, tildes etcetera), the character appears to occupy two QChars.
A test program that prints the utf-16 values for the arguments to the program:
#include <iostream>
#include <QString>
using namespace std;
void print(QString qs)
{
for (QString::iterator it = qs.begin(); it != qs.end(); ++it)
cout << hex << it->unicode() << " ";
cout << "\n";
}
int main(int argc, char *argv[])
{
for (int i = 1; i < argc; i++)
print(QString::fromStdString(argv[i]));
}
Some output:
$ ./utf16 Ç ü é
c3 87
c3 bc
c3 a9
I had expected
c387
c3bc
c3a9
Tried the various normalizationsforms avaialable in QString but no one had fewer bytes than the default.
Since QChar is 2 bytes it should be able to hold the value of the characters above in one object. Why does the QString use two QChars? How can I fetch the combined unicode value?
Upvotes: 2
Views: 3521
Reputation: 179779
Just sidestep the problem. See QApplication
in Unicode. QApplication::arguments
is already UTF-16 encoded for you taking local conventions into account.
Upvotes: 0
Reputation: 98425
QString::fromStdString
expects an ASCII string and doesn't do any decoding. Use fromLocal8Bit
instead.
Your expected output is wrong. For example, Ç
is U+00C7, so you should expect C7, not the UTF-8 encoding of C3 87!
If you modify main()
as below, you get the expected Unicode code points. For each character, the first line lists the local encoding (here: Utf-8), since fromStdString
is essentially a no-op and passes everything straight. The second line lists the correctly decoded Unicode code point index.
$ ./utf16 Ç ü é
c3 87
c7
c3 bc
fc
c3 a9
e9
int main(int argc, char *argv[])
{
for (int i = 1; i < argc; i++) {
print(QString::fromStdString(argv[i]));
print(QString::fromLocal8Bit(argv[i]));
}
}
Upvotes: 3