user10820813
user10820813

Reputation:

Unicode Hex String to String

I have a unicode string like this:

0030003100320033

Which should turn into 0123. This is a simple case of 0123 string, but there are some string and unicode chars as well. How can I turn this type of unicode hex string to string in C#?

For normal US charset, first part is always 00, so 0031 is "1" in ASCII, 0032 is "2" and so on.

When its actual unicode char, like Arabic and Chinese, first part is not 00, for instance for Arabic its 06XX, like 0663.

I need to be able to turn this type of Hex string into C# decimal string.

Upvotes: 0

Views: 1272

Answers (3)

pim3nt3l
pim3nt3l

Reputation: 96

You should try this solution

public static void Main()
{
    string hexString = "0030003100320033"; //Hexa pair numeric values
    //string hexStrWithDash = "00-30-00-31-00-32-00-33"; //Hexa pair numeric values separated by dashed. This occurs using BitConverter.ToString()
    byte[] data = ParseHex(hexString);
    string result = System.Text.Encoding.BigEndianUnicode.GetString(data); 
    Console.Write("Data: {0}", result);
}

public static byte[] ParseHex(string hexString)
{
    hexString = hexString.Replace("-", "");
    byte[] output = new byte[hexString.Length / 2];
    for (int i = 0; i < output.Length; i++)
    {
        output[i] = Convert.ToByte(hexString.Substring(i * 2, 2), 16);
    }
    return output;
}

Upvotes: -1

Slai
Slai

Reputation: 22886

Shorter less efficient alternative:

Regex.Replace("0030003100320033", "....", m => (char)Convert.ToInt32(m + "", 16) + "");

Upvotes: 1

Douglas
Douglas

Reputation: 54917

There are several encodings that can represent Unicode, of which UTF-8 is today's de facto standard. However, your example is actually a string representation of UTF-16 using the big-endian byte order. You can convert your hex string back into bytes, then use Encoding.BigEndianUnicode to decode this:

public static void Main()
{
    var bytes = StringToByteArray("0030003100320033");
    var decoded = System.Text.Encoding.BigEndianUnicode.GetString(bytes);
    Console.WriteLine(decoded);   // gives "0123"
}

// https://stackoverflow.com/a/311179/1149773
public static byte[] StringToByteArray(string hex)
{
    byte[] bytes = new byte[hex.Length / 2];
    for (int i = 0; i < hex.Length; i += 2)
        bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
    return bytes;
}

Since Char in .NET represents a UTF-16 code unit, this answer should give identical results to Slai's, including for surrogate pairs.

Upvotes: 3

Related Questions