Reputation: 53
I am trying to migrate a code from VC++ to .net. VC++ code uses MultibyteToWideChar and WideCharToMultiByte functions provided by WinAPI. I tried using System.Text.Encoding class in .NET but it is not working for all encodings. Is there any other way to do this conversion? What is wrong in below code snippet?
Here is my C# code:
public static string MultiByteToWideChar(string input, int codepage)
{
Encoding e1 = Encoding.GetEncoding(codepage);
Encoding e2 = Encoding.Unicode;
//byte[] source = e1.GetBytes(input);
byte[] source = MBCSToByte(input);
byte[] target = Encoding.Convert(e1, e2, source);
return e2.GetString(target);
}
public static string WideCharToMultiByte(string input, int codepage)
{
Encoding e1 = Encoding.Unicode;
Encoding e2 = Encoding.GetEncoding(codepage);
byte[] source = e1.GetBytes(input);
byte[] target = Encoding.Convert(e1, e2, source);
return Encoding.GetEncoding(codepage).GetString(target);
}
private static byte[] MBCSToByte(string s)
{
byte[] b = new byte[s.Length];
int i = 0;
foreach (char c in s)
b[i++] = (byte)c;
return b;
}
MultiByteToWideChar is working only for codepage 1255 and not for 866
WideCharToMultiByte is not working for codepage 1251.
Upvotes: 0
Views: 2232
Reputation: 595402
MultiByteToWideChar()
converts encoded bytes (NOT characters!) to Unicode characters.
WideCharToMultiByte()
converts Unicode characters to encoded bytes (NOT characters!).
In .NET, the string
type is always a sequence of Unicode characters (in UTF-16 byte encoding). So using string
to hold encoded bytes is just plain wrong.
In your MultiByteToWideChar()
function, you are assuming that the input string
contains Unicode characters that are 16-bit representations of codepage-encoded 8-bit bytes. You are translating the Unicode characters as-is to a byte[]
array, then converting that assumingly codepage-encoded array to a UTF-16 byte[]
array, and then you are converting that to a UTF-16 string
. This will work fine if and only if the initial assumption is true to begin with. Which is usually not the case, unless your input was corrupted to begin with.
In your WideCharToMultiByte()
function, you are converting the input string
to a UTF-16 byte[]
array, then converting that array to a codepage-encoded byte[]
array. So far so good (though you could just use Encoding.GetBytes()
to go from the UTF-16 string
directly to the codepage-encoded byte[]
without using Encoding.Convert()
at all). But then you are using the same codepage to convert the codepage-encoded byte[]
array back to a UTF-16 string
, thus un-doing everything you had done. The output string
will be the same value as the input string
(provided the specified codepage supports all of the Unicode characters in the input string
, otherwise you will have data loss during the first codepage conversion).
That being said, the correct code should look more like this instead:
public static string MultiByteToWideChar(byte[] input, int codepage)
{
return Encoding.GetEncoding(codepage).GetString(input);
}
public static byte[] WideCharToMultiByte(string input, int codepage)
{
return Encoding.GetEncoding(codepage).GetBytes(input);
}
Don't use a string
to hold encoded bytes, use an actual byte[]
array instead.
Upvotes: 1
Reputation: 63722
string
is a string of characters, not a byte stream. You already lost when you wrapped your binary data in a string
.
If you want proper conversions between encodings, make sure to use byte[]
. string
already gives meaning to those bytes. .NET's string
isn't the same thing as C's char*
. Keep string
for string
s, and use byte[]
for persistence, networking etc.
Upvotes: 1