Reputation: 6675
I have a .NET based Excel addin that uses a C++/CLI library to read/write proprietary files. The C++/CLI library links to some core C++ libraries that provide classes to read and write these files. The core classes use std::string and std::i/ofstream to read/write data in proprietary files.
So when saving data, it goes from:
Excel >> .NET AddIn (string) >> C++/CLI Lib (System::String) >> C++ Core Lib (std::string)
All works fine with simple text (ASCII) files. Now I have a text file (ANSI encoding) with some Japanese characters in it saved on a Japanese machine. I think it uses the SHIFT-JIS encoding by default. This file LOADS fine (I see the characters in Excel same as I see in Notepad) but if I save it back unmodified then the character changes to ??. I think its because the std::string and std::ofstream classes are writing it incorrectly as simple ASCII stream.
I use the following syntax while reading the file to convert them to .NET strings:
%String(mystring.c_str());
and the following while converting them from .NET strings to std::strings while writing:
msclr::interop::marshal_as<std::string>(mydotnetstring)
The problem seems to me with encoding but I am not crystal clear on what exactly is happening. I want to understand WHY the file is READ CORRECTLY but not written correctly?
I have modified my application to read/write UTF-8 and that solves the problem but I still want to know the underlying problem.
Upvotes: 0
Views: 780
Reputation: 6675
Okay, I think I have found the underlying problem. The problem is that the msclr::interop::marshal_as< std::string > method calls WideCharToMultiByte API internally with CP_THREAD_ACP option which means that the CodePage of active THREAD is used. This .NET addin runs inside the Excel process and the current thread has a different CodePage (952 on Japanese system) than the Default CodePage (1252). I verified this by checking the return value of marshal_as call in a sample application vs the .NET addin on a Japanese machine. The sample application was converting a two Japanese character string to 4 bytes whereas the addin was just converting it to 2 unknown '?' bytes.
SOLUTION
marshal_as does not provide an option to change this option so the solution is to marshal .NET strings by directly using the WideCharToMultiByte API with CP_ACP option. It worked for me.
Upvotes: 0