A.B.
A.B.

Reputation: 16630

Strings of 4 byte character in Windows?

I have a program that does various operations on char types in std::string, for example

if (my_string.front() == my_char) {
    // do stuff with my_string
}

I'm looking for some practical advice on how to make my program support Unicode. I need the ability to compare characters to characters, and that means 4 byte characters are required so that even the largest Unicode characters can be processed without losses.

I'm on Windows with a GCC compiler and read that in this case, std::wstring is 2 bytes. C++11 has std::u32string with 4 bytes but it seems largely unsupported by the standard library.

What's the easiest solution in this case?

Upvotes: 1

Views: 306

Answers (2)

fjardon
fjardon

Reputation: 7996

Even if you had a string of uint32 you could not just compare these integers one by one. You would have to first normalize the strings before. As normalization is NOT simple, you will end up using a library like ICU. So you may directly try to use it directly :)

http://site.icu-project.org/

Upvotes: 2

compie
compie

Reputation: 10536

Windows uses the UTF-16 encoding: http://en.wikipedia.org/wiki/UTF-16

You don't need "four byte characters" to support all unicode symbols. UTF-16 is a variable length encoding.

Good reading material: http://www.joelonsoftware.com/articles/Unicode.html

Upvotes: 1

Related Questions