Reputation: 31
I have this code to help parse the unicode for an emoji:
public string DecodeEncodedNonAsciiCharacters(string value)
{
return Regex.Replace(
value,
@"\\u(?<Value>[a-zA-Z0-9]{4})",
m =>
((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString();
);
}
so I put my code as such
DecodeEncodedNonAsciiCharacters("\uD83C\uDFCB\uD83C\uDFFF\u200D\u2642\uFE0F");
into Console.WriteLine();
which gives me this emoji 🏋🏿♂️ so my question is how can I turn this
"\uD83C\uDFCB\uD83C\uDFFF\u200D\u2642\uFE0F"
into this Codepoints
U+1F3CB, U+1F3FF, U+200D, U+2642, U+FE0F
the codepoints above are from Emojipedia.org
Upvotes: 1
Views: 139
Reputation: 186833
It seems, that you want to combine two surrogate characters into one Utf-32:
\uD83C\uDFCB => \U0001F3CB
If it's your case, you can put it like this:
Code:
public static IEnumerable<int> CombineSurrogates(string value) {
if (null == value)
yield break; // or throw new ArgumentNullException(name(value));
for (int i = 0; i < value.Length; ++i) {
char current = value[i];
char next = i < value.Length - 1 ? value[i + 1] : '\0';
if (char.IsSurrogatePair(current, next)) {
yield return (char.ConvertToUtf32(current, next));
i += 1;
}
else
yield return (int)current;
}
}
public static string DecodeEncodedNonAsciiCharacters(string value) =>
string.Join(" ", CombineSurrogates(value).Select(code => $"U+{code:X4}"));
Demo:
string data = "\uD83C\uDFCB\uD83C\uDFFF\u200D\u2642\uFE0F";
// If you want codes, uncomment the line below
//int[] codes = CombineSurrogates().ToArray(data);
string result = DecodeEncodedNonAsciiCharacters(data);
Console.Write(result);
Outcome:
U+1F3CB U+1F3FF U+200D U+2642 U+FE0F
Upvotes: 1