Reputation: 4252
std::iostream classes lack specialization for char16_t and char32_t and boost::format depends on streams. What to replace streams with for utf16 strings (preferably with localization support)?
Upvotes: 2
Views: 1523
Reputation: 154005
The fundamental entities streams work on are characters, not encoded characters. For Unicode it was decided that one character can be split across multiple entities making it inherently incompatible with the stream abstraction.
The addition of new character types intended to deal a standard way to deal with Unicode characters but it was deemed too complex to also redo the behavior of IOStreams and locales to keep with the added complexities. This is partly due to people not quite loving stream and partly due to being a large and non-trivial task. I would think that the required facets can be defined to be capable to deal with simple situations but I'm not sure if this would result in a fast solution and if it would cover languages where Unicode is needed: I can see how it can be made to work for European text but I don't know whether thing would really work for Asian text.
Upvotes: 2
Reputation: 19114
This is good. The encodings argument is over and pretty much settled. You do not want utf16 strings anywhere in your program except when communicating with legacy APIs, which is when you convert the whole formatted string, best done by boost::narrow and widen. Unless, of course, you are doing some rare edge-case optimizations.
See http://utf8everywhere.org.
Upvotes: 1
Reputation: 24351
The current stream are usually implemented as templates (I don't have a copy of the standard here, but I'm pretty sure that they have to be implemented as templates) so making them wide-string aware should be a simple matter of instantiating the templates with the appropriate character type.
Most likely your implementation will already have predefined specialisations for wide strings. Have a look for something like std::wstringstream.
That said, the various character types in C++ don't make any assumptions about the encoding of the strings you put in there so you'd to handle this as a "per convention" way - as in, your wide strings are encoded as utf16 by convention, but there is nothing in the library that enforces this convention.
Upvotes: 0