Twifty
Twifty

Reputation: 3378

template specialization for wchar_t

Can somebody explain why this doesn't work?

template < typename T > struct tester { static const size_t value = 0; };
template <> struct tester< char > { static const size_t value = 1; };
template <> struct tester< unsigned short > { static const size_t value = 2; };

size_t nTest = tester< wchar_t >::value;

On my compiler, wchar_t is typedef'd as unsigned short. Why is the default template being used when the underlying type has a specialization?

Edit: OK, So I was wrong about it being type defined. Intellisense was showing me something else. My cross platform, and surrogate, question remains though.

This has thrown me a curveball because I want to work with wchar_t depending on it's size.

Another related question. How can I work with wchar_t in a cross platform manner? I know it's 16 bits on windows, and elsewhere it can be 32 bits. If it's defined as a 32 bit type does that mean it doesn't (as in compiler forced) use surrogate pairs?

Would something like this work?

template < typename T, size_t N = sizeof( wchar_t ) > struct remap;
template <> struct remap< wchar_t, 2 > { typedef unsigned short type; };
template <> struct remap< wchar_t, 4 > { typedef unsigned long type; };

Upvotes: 0

Views: 287

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 597051

You can simplify your template to the following, you don't need to specialize it:

template < typename T > struct tester { static const size_t value = sizeof(T) / 8; };

On platforms where sizeof(wchar_t) is 2, UTF-16 is used, so surrogates apply.

On platforms where sizeof(wchar_t) is 4, UTF-32 is used, so surrogates do not apply.

Upvotes: 1

bames53
bames53

Reputation: 88215

The C++ standard specifies that wchar_t is a unique type and not a typedef. On some non-conforming implementations, or with some implementation-defined option, it may be a typedef, but you cannot rely on this or rely on it being typedef'd to any particular type in portable code.

Yes, your remap specializations will work; remap<wchar_t>::type will be unsigned long on platforms with a four byte wchar_t and unsigned short on platforms with a two byte wchar_t. Of course wchar_t is not limited to those two sizes, and a two byte value doesn't mean it's 16 bits, etc. If you want to write portable code based on the largest value wchar_t can hold you might look at WCHAR_MAX or one of the options shown in the comments, rather than sizeof(wchar_t).

If it's defined as a 32 bit type does that mean it doesn't (as in compiler forced) use surrogate pairs?

The standard doesn't actually specify wchar_t in a way that's particularly useful for the things people usually want to use it for. The intent of wchar_t is to provide a type where any character in the current locale will be represented as a single wchar_t value, in order to enable easier text processing. As such wchar_t 1) isn't required to use the same encoding in all locales and 2) surrogate pairs aren't really permitted.

Windows' use of UTF-16 sort of gets around that seconds point by not supporting any locale that contain characters that require surrogate pairs. Which means that portable code won't deal with surrogate pairs on any platform, including Windows.

But yes, on platforms with 32 bit wide wchar_t UTF-32 is a common wchar_t encoding for many locales, especially if you stick to locales that use UTF-8 as the char encoding.

Upvotes: 2

Related Questions