Reputation: 11

Converting unicode character to a single hexadecimal value in C#

I am getting a character from a emf record using Encoding.Unicode.GetString and the resulting string contains only one character but has two bytes. I don't have any idea about the encoding scheme and the multi byte character set. I want to convert that character to its equivalent single hexadecimal value.Can you help me regarding this..

Upvotes: 1

Answers (4)

Hien Nguyen

Reputation: 18973

I created an extension method to convert unicode or non-unicode string to hex string.

I shared for whom concern.

public static class StringHelper
    {
        public static string ToHexString(this string str)
        {
            byte[] bytes = str.IsUnicode() ? Encoding.UTF8.GetBytes(str) : Encoding.Default.GetBytes(str);

            return BitConverter.ToString(bytes).Replace("-", string.Empty);
        }

        public static bool IsUnicode(this string input)
        {
            const int maxAnsiCode = 255;

            return input.Any(c => c > maxAnsiCode);
        }
}

Upvotes: 0

Jon Skeet

Reputation: 1503459

It's not clear what you mean. A char in C# is a 16-bit unsigned value. If you've got a binary data source and you want to get Unicode characters, you should use an Encoding to decode the binary data into a string, that you can access as a sequence of char values.

You can convert a char to a hex string by first converting it to an integer, and then using the X format specifier like this:

char = '\u0123';
string hex = ((int)c).ToString("X4"); // Now hex = "0123"

Now, that leaves one more issue: surrogate pairs. Values which aren't in the Basic Multilingual Plane (U+0000 to U+FFFF) are represented by two UTF-16 code units - a high surrogate and a low surrogate. You can use the char.IsSurrogate* methods to check for surrogate pairs... although it's harder (as far as I can see) to then convert a surrogate pair into a UCS-4 value. If you're lucky, you won't need to deal with this... if you're happy converting your binary data into a sequence of UTF-16 code units instead of strict UCS-4 values, you don't need to worry.

EDIT: Given your comments, it's still not entirely clear what you've got to start with. You say you've got two bytes... are they separate, or in a byte array? What do they represent? Text in a particular encoding, presumably... but which encoding? Once you know the encoding, you can convert a byte array into a string easily:

byte[] bytes = ...;
// For example, if your binary data is UTF-8
string text = Encoding.UTF8.GetString(bytes);
char firstChar = text[0];
string hex = ((int)firstChar).ToString("X4");

If you could edit your question to give more details about your actual situation, it would be a lot easier to help you get to a solution. If you're generally confused about encodings and the difference between text and binary data, you might want to read my article about it.

Upvotes: 6

Ana Betts

Reputation: 74692

Get thee to StringInfo:

http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx

http://msdn.microsoft.com/en-us/library/8k5611at.aspx

The .NET Framework supports text elements. A text element is a unit of text that is displayed as a single character, called a grapheme. A text element can be a base character, a surrogate pair, or a combining character sequence. The StringInfo class provides methods that allow your application to split a string into its text elements and iterate through the text elements. For an example of using the StringInfo class, see String Indexing.

Upvotes: 0

codekaizen

Reputation: 27429

Try this:

System.Text.Encoding.Unicode.GetBytes(theChar.ToString())
     .Aggregate("", (agg, val) => agg + val.ToString("X2"));

However, since you don't specify exactly what encoding that the character is in, this could fail. Futher, you don't make it very clear if you want the output to be a string of hex chars or bytes. I'm guessing the former, since I'd guess you want to generate HTML. Let me know if any of this is wrong.

Upvotes: 2

Converting unicode character to a single hexadecimal value in C#

Answers (4)

Related Questions