Reputation: 409
How to arrange correct processing of Unicode strings using pure C++?
What I mean is, when you put your unicode string into std::string and count its length, sometimes you get like 10 characters for 5-chars-long string.
How do they do it in serious open-source programs? How do they do it in a cross-platform manner? How do you tie it to file i/o and stdin/stdout streams?
Thanks.
Upvotes: 2
Views: 379
Reputation: 147018
ICU is currently the Unicode library. If you want cross-platform Unicode support, ICU is basically the only place to get it.
If only its interface wasn't more unfriendly than the wrong end of an automatic shotgun.
Upvotes: 1
Reputation: 49850
There's Boost.Locale, which is written in C++, wraps the ICU library, and provides a nice, non-alien interface to it.
For Unicode work, my first choice would be Boost.Locale, followed by ICU directly (if there is something that Boost.Locale doesn't wrap yet).
Upvotes: 5
Reputation:
I've used wxWidgets to do this. It makes for easy conversion from std::string to their string type wxString. It's not ideal, but it works well, is simple and portable.
Upvotes: 0
Reputation: 76785
std::[w]string
, contrary to popular belief, has no Unicode support whatsoever. They both operate only on [w]char[_t]
units, in an encoding agnostic way.
If you only need basic Unicode support in the form of length and conversions and encoding verification, there is utfcpp, which provides a beautiful C++ interface for these operations.
Application frameworks like Qt and wxWdigets do provide their own string
classes, which offer better Unicode support, but often tying you to use the whole framework throughout your code.
Aside from that, there is ICU, which is the standard Unicode implementation around today.
A work in progress by one of the C++ masters on this website is ogonek. you can surely contact the author through the Lounge<C++>
StackOverflow chat room to ask for details on his progress.
Upvotes: 4