Reputation: 3378
Can somebody explain why this doesn't work?
template < typename T > struct tester { static const size_t value = 0; };
template <> struct tester< char > { static const size_t value = 1; };
template <> struct tester< unsigned short > { static const size_t value = 2; };
size_t nTest = tester< wchar_t >::value;
On my compiler, wchar_t
is typedef'd as unsigned short
. Why is the default template being used when the underlying type has a specialization?
Edit: OK, So I was wrong about it being type defined. Intellisense was showing me something else. My cross platform, and surrogate, question remains though.
This has thrown me a curveball because I want to work with wchar_t
depending on it's size.
Another related question. How can I work with wchar_t
in a cross platform manner? I know it's 16 bits on windows, and elsewhere it can be 32 bits. If it's defined as a 32 bit type does that mean it doesn't (as in compiler forced) use surrogate pairs?
Would something like this work?
template < typename T, size_t N = sizeof( wchar_t ) > struct remap;
template <> struct remap< wchar_t, 2 > { typedef unsigned short type; };
template <> struct remap< wchar_t, 4 > { typedef unsigned long type; };
Upvotes: 0
Views: 287
Reputation: 597051
You can simplify your template to the following, you don't need to specialize it:
template < typename T > struct tester { static const size_t value = sizeof(T) / 8; };
On platforms where sizeof(wchar_t)
is 2, UTF-16 is used, so surrogates apply.
On platforms where sizeof(wchar_t)
is 4, UTF-32 is used, so surrogates do not apply.
Upvotes: 1
Reputation: 88215
The C++ standard specifies that wchar_t
is a unique type and not a typedef. On some non-conforming implementations, or with some implementation-defined option, it may be a typedef, but you cannot rely on this or rely on it being typedef'd to any particular type in portable code.
Yes, your remap
specializations will work; remap<wchar_t>::type
will be unsigned long
on platforms with a four byte wchar_t
and unsigned short
on platforms with a two byte wchar_t
. Of course wchar_t
is not limited to those two sizes, and a two byte value doesn't mean it's 16 bits, etc. If you want to write portable code based on the largest value wchar_t
can hold you might look at WCHAR_MAX
or one of the options shown in the comments, rather than sizeof(wchar_t)
.
If it's defined as a 32 bit type does that mean it doesn't (as in compiler forced) use surrogate pairs?
The standard doesn't actually specify wchar_t
in a way that's particularly useful for the things people usually want to use it for. The intent of wchar_t
is to provide a type where any character in the current locale will be represented as a single wchar_t
value, in order to enable easier text processing. As such wchar_t
1) isn't required to use the same encoding in all locales and 2) surrogate pairs aren't really permitted.
Windows' use of UTF-16 sort of gets around that seconds point by not supporting any locale that contain characters that require surrogate pairs. Which means that portable code won't deal with surrogate pairs on any platform, including Windows.
But yes, on platforms with 32 bit wide wchar_t
UTF-32 is a common wchar_t
encoding for many locales, especially if you stick to locales that use UTF-8 as the char
encoding.
Upvotes: 2