Reputation:
I have a unicode string like this:
0030003100320033
Which should turn into 0123. This is a simple case of 0123 string, but there are some string and unicode chars as well. How can I turn this type of unicode hex string to string in C#?
For normal US charset, first part is always 00, so 0031 is "1" in ASCII, 0032 is "2" and so on.
When its actual unicode char, like Arabic and Chinese, first part is not 00, for instance for Arabic its 06XX, like 0663.
I need to be able to turn this type of Hex string into C# decimal string.
Upvotes: 0
Views: 1272
Reputation: 96
You should try this solution
public static void Main()
{
string hexString = "0030003100320033"; //Hexa pair numeric values
//string hexStrWithDash = "00-30-00-31-00-32-00-33"; //Hexa pair numeric values separated by dashed. This occurs using BitConverter.ToString()
byte[] data = ParseHex(hexString);
string result = System.Text.Encoding.BigEndianUnicode.GetString(data);
Console.Write("Data: {0}", result);
}
public static byte[] ParseHex(string hexString)
{
hexString = hexString.Replace("-", "");
byte[] output = new byte[hexString.Length / 2];
for (int i = 0; i < output.Length; i++)
{
output[i] = Convert.ToByte(hexString.Substring(i * 2, 2), 16);
}
return output;
}
Upvotes: -1
Reputation: 22886
Shorter less efficient alternative:
Regex.Replace("0030003100320033", "....", m => (char)Convert.ToInt32(m + "", 16) + "");
Upvotes: 1
Reputation: 54917
There are several encodings that can represent Unicode, of which UTF-8 is today's de facto standard. However, your example is actually a string representation of UTF-16 using the big-endian byte order. You can convert your hex string back into bytes, then use Encoding.BigEndianUnicode
to decode this:
public static void Main()
{
var bytes = StringToByteArray("0030003100320033");
var decoded = System.Text.Encoding.BigEndianUnicode.GetString(bytes);
Console.WriteLine(decoded); // gives "0123"
}
// https://stackoverflow.com/a/311179/1149773
public static byte[] StringToByteArray(string hex)
{
byte[] bytes = new byte[hex.Length / 2];
for (int i = 0; i < hex.Length; i += 2)
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
return bytes;
}
Since Char
in .NET represents a UTF-16 code unit, this answer should give identical results to Slai's, including for surrogate pairs.
Upvotes: 3