Reputation: 3327
What is the best practice of Unicode processing in C++?
Upvotes: 109
Views: 46411
Reputation: 1728
As has been said above a library is the best bet when using a large system. However some times you do want to handle things your self (maybe because the library would use to many resources like on a micro controller). In this case you want a simple library that you can copy the parts out of for the things you actually need.
Willow Schlanger's example code seems like a good one (see his answer for more details).
I also found another one that has smaller code, but lacks full error checking and only handles UTF-8 but was simpler to take parts out of.
Here's a list of the embedded libraries that seem decent.
Upvotes: 1
Reputation: 1603
If you don't care about backwards compatibility with previous C++ standards, the current C++11 standard has built in Unicode support: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2011/n3242.pdf
So the truly best practice for Unicode processing in C++ would be to use the built in facilities for it. That isn't always a possibility with older code bases though, with the standard being so new at present.
EDIT: To clarify, C++11 is Unicode aware in that it now has support for Unicode literals and Unicode strings. However, the standard library has only limited support for Unicode processing and conversion. For your current needs this may be enough. However, if you need to do a large amount of heavy lifting right now then you may still need to use something like ICU for more in-depth processing. There are some proposals currently in the works to include more robust support for text conversion between different encodings. My guess (and hope) is that this will be part of the next technical report.
Upvotes: 10
Reputation: 17486
is_alpha
unless that is the definition you want.string
if you care about correctness, always use your unicode library for this.Upvotes: 81
Reputation: 29
Although this may not be best practice for everyone, you can write your own C++ UNICODE routines if you want!
I just finished doing it over a weekend. I learned a lot, though I don't guarantee it's 100% bug free, I did a lot of testing and it seems to work correctly.
My code is under the New BSD license and can be found here:
http://code.google.com/p/netwidecc/downloads/list
It is called WSUCONV and comes with a sample main() program that converts between UTF-8, UTF-16, and Standard ASCII. If you throw away the main code, you've got a nice library for reading / writing UNICODE.
Upvotes: 2
Reputation: 4986
Our company (and others) use the open source Internation Components for Unicode (ICU) library originally developed by Taligent.
It handles strings, locales, conversions, date/times, collation, transformations, et. al.
Start with the ICU Userguide
Upvotes: 8
Reputation: 14084
Look at Case insensitive string comparison in C++
That question has a link to the Microsoft documentation on Unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx
If you look on the left-hand navigation side on MSDN next to that article, you should find a lot of information pertaining to Unicode functions. It is part of a chapter on "Encoding Characters" (http://msdn.microsoft.com/en-us/library/cc194786.aspx)
It has the following subsections:
Upvotes: 3
Reputation: 34345
Here is a checklist for Windows programming:
Upvotes: 5