dom_beau
dom_beau

Reputation: 2497

Char string encoding differences between native C++ and C++/CLI?

I have a strange problem for which I believe there is a solution but I cannot find it. Your help would be appreciated.

On the one hand, I have a native C++ class named Native which has a static wchar_t array containing accentuated characters. This array is const and defined at build time.

/// Header file
Native
{
public:
    static const wchar_t* Array() const { return mArray; }

private:
    static const wchar_t *mArray;
};

//--------------------------------------------------------------

/// .cpp file
const wchar_t* Native::mArray = {L"This is a description éàçï"};

On the other hand, I have a C++/CLI class that uses the array like this:

/// C++/CLI use
System::String^ S1 = gcnew System::String( Native::Array() );
System::String^ S2 = gcnew System::String( L"This is a description éàçï" };

The problem is that while S2 gives This is a description éàçï as expected, S1 gives This is a description éà çï. I do not understand why passing a pointer to a static array will not give the same result as giving the same array directly???

I guess this is an encoding problem but I would have expected the same results for both S1 and S2. Do you know how to solve the problem? The way I must use it in my program is like S1 i.e. by accessing the build time static array with a static method that returns a const wchar_t*.

Thanks for your help!


EDIT 1

What is the best way to define literals at build time in C++ using Intel C++ 13.0 to make them directly usable in C++/CLI System::String constructor? This could be the ultimate question for my problem.

Upvotes: 2

Views: 1079

Answers (1)

Tim
Tim

Reputation: 153

I don't have enough reputation to add a comment to ask this question, so I apologize for posting this as an answer if that seems inappropriate.

Could the problem be that your compiler defines wchar_t to be 8 bits? I'm basing that is possible on this answer:

Should I use wchar_t when using UTF-8?

To answer your question (in the comments) about building a UTF-16 array at build time, I believe you can force it to be UTF-16 by using u"..." for your literal instead of L"..." (see http://en.cppreference.com/w/cpp/language/string_literal)

Edit 1: For what it's worth, I tried your code (after fixing a couple compile errors) using Microsoft Visual Studio 10 and didn't have the same problem (both strings printed as expected).

I don't know if it will help you, but another possible way to statically initialize this wchar_t array is to use std::wstring to wrap your literal and then set your array to the c-string pointer returned by wstring::c_str(), shown as follows:

std::wstring ws(L"This is a description éàçï");
const wchar_t* Native::mArray = ws.c_str();

This edit was inspired by Dynamic wchar_t array (C++ beginner)

Upvotes: 2

Related Questions