A9S6
A9S6

Reputation: 6675

Japanese characters are not written correctly when saving to a file

I have a .NET based Excel addin that uses a C++/CLI library to read/write proprietary files. The C++/CLI library links to some core C++ libraries that provide classes to read and write these files. The core classes use std::string and std::i/ofstream to read/write data in proprietary files.

So when saving data, it goes from:
Excel >> .NET AddIn (string) >> C++/CLI Lib (System::String) >> C++ Core Lib (std::string)

All works fine with simple text (ASCII) files. Now I have a text file (ANSI encoding) with some Japanese characters in it saved on a Japanese machine. I think it uses the SHIFT-JIS encoding by default. This file LOADS fine (I see the characters in Excel same as I see in Notepad) but if I save it back unmodified then the character changes to ??. I think its because the std::string and std::ofstream classes are writing it incorrectly as simple ASCII stream.

I use the following syntax while reading the file to convert them to .NET strings:

%String(mystring.c_str());

and the following while converting them from .NET strings to std::strings while writing:

msclr::interop::marshal_as<std::string>(mydotnetstring)

The problem seems to me with encoding but I am not crystal clear on what exactly is happening. I want to understand WHY the file is READ CORRECTLY but not written correctly?

I have modified my application to read/write UTF-8 and that solves the problem but I still want to know the underlying problem.

Upvotes: 0

Views: 780

Answers (1)

A9S6
A9S6

Reputation: 6675

Okay, I think I have found the underlying problem. The problem is that the msclr::interop::marshal_as< std::string > method calls WideCharToMultiByte API internally with CP_THREAD_ACP option which means that the CodePage of active THREAD is used. This .NET addin runs inside the Excel process and the current thread has a different CodePage (952 on Japanese system) than the Default CodePage (1252). I verified this by checking the return value of marshal_as call in a sample application vs the .NET addin on a Japanese machine. The sample application was converting a two Japanese character string to 4 bytes whereas the addin was just converting it to 2 unknown '?' bytes.

SOLUTION
marshal_as does not provide an option to change this option so the solution is to marshal .NET strings by directly using the WideCharToMultiByte API with CP_ACP option. It worked for me.

Upvotes: 0

Related Questions