template specialization for wchar_t

Question

Can somebody explain why this doesn't work?

template < typename T > struct tester { static const size_t value = 0; };
template <> struct tester< char > { static const size_t value = 1; };
template <> struct tester< unsigned short > { static const size_t value = 2; };

size_t nTest = tester< wchar_t >::value;

On my compiler, wchar_t is typedef'd as unsigned short. Why is the default template being used when the underlying type has a specialization?

Edit: OK, So I was wrong about it being type defined. Intellisense was showing me something else. My cross platform, and surrogate, question remains though.

This has thrown me a curveball because I want to work with wchar_t depending on it's size.

Another related question. How can I work with wchar_t in a cross platform manner? I know it's 16 bits on windows, and elsewhere it can be 32 bits. If it's defined as a 32 bit type does that mean it doesn't (as in compiler forced) use surrogate pairs?

Would something like this work?

template < typename T, size_t N = sizeof( wchar_t ) > struct remap;
template <> struct remap< wchar_t, 2 > { typedef unsigned short type; };
template <> struct remap< wchar_t, 4 > { typedef unsigned long type; };

bames53 · Accepted Answer

The C++ standard specifies that wchar_t is a unique type and not a typedef. On some non-conforming implementations, or with some implementation-defined option, it may be a typedef, but you cannot rely on this or rely on it being typedef'd to any particular type in portable code.

Yes, your remap specializations will work; remap::type will be unsigned long on platforms with a four byte wchar_t and unsigned short on platforms with a two byte wchar_t. Of course wchar_t is not limited to those two sizes, and a two byte value doesn't mean it's 16 bits, etc. If you want to write portable code based on the largest value wchar_t can hold you might look at WCHAR_MAX or one of the options shown in the comments, rather than sizeof(wchar_t).

If it's defined as a 32 bit type does that mean it doesn't (as in compiler forced) use surrogate pairs?

The standard doesn't actually specify wchar_t in a way that's particularly useful for the things people usually want to use it for. The intent of wchar_t is to provide a type where any character in the current locale will be represented as a single wchar_t value, in order to enable easier text processing. As such wchar_t 1) isn't required to use the same encoding in all locales and 2) surrogate pairs aren't really permitted.

Windows' use of UTF-16 sort of gets around that seconds point by not supporting any locale that contain characters that require surrogate pairs. Which means that portable code won't deal with surrogate pairs on any platform, including Windows.

But yes, on platforms with 32 bit wide wchar_t UTF-32 is a common wchar_t encoding for many locales, especially if you stick to locales that use UTF-8 as the char encoding.

template specialization for wchar_t

Answers (2)

Related Questions