Reputation: 15049
Im getting a string from client side like this:
This is a face :grin:
And i need to convert the :grin: to unicode
in order to send it to other service.
Any clue how to do that?
Upvotes: 3
Views: 4628
Reputation: 101623
Here is a link to a quite good json file with relevant information. It contains huge array (about 1500 entries) with emojis, and we are interested in 2 properties: "short_name" which represents name like "grin", and "unified" property, which contains unicode representation like "1F601".
I built a helper class to replace short names like ":grin:" with their unicode equivalent:
public static class EmojiParser {
static readonly Dictionary<string, string> _colonedEmojis;
static readonly Regex _colonedRegex;
static EmojiParser() {
// load mentioned json from somewhere
var data = JArray.Parse(File.ReadAllText(@"C:\path\to\emoji.json"));
_colonedEmojis = data.OfType<JObject>().ToDictionary(
// key dictionary by coloned short names
c => ":" + ((JValue)c["short_name"]).Value.ToString() + ":",
c => {
var unicodeRaw = ((JValue)c["unified"]).Value.ToString();
var chars = new List<char>();
// some characters are multibyte in UTF32, split them
foreach (var point in unicodeRaw.Split('-'))
{
// parse hex to 32-bit unsigned integer (UTF32)
uint unicodeInt = uint.Parse(point, System.Globalization.NumberStyles.HexNumber);
// convert to bytes and get chars with UTF32 encoding
chars.AddRange(Encoding.UTF32.GetChars(BitConverter.GetBytes(unicodeInt)));
}
// this is resulting emoji
return new string(chars.ToArray());
});
// build huge regex (all 1500 emojies combined) by join all names with OR ("|")
_colonedRegex = new Regex(String.Join("|", _colonedEmojis.Keys.Select(Regex.Escape)));
}
public static string ReplaceColonNames(string input) {
// replace match using dictoinary
return _colonedRegex.Replace(input, match => _colonedEmojis[match.Value]);
}
}
Usage is obvious:
var target = "This is a face :grin: :hash:";
target = EmojiParser.ReplaceColonNames(target);
It's quite fast (except first run, because of static constructor initialization). On your string it takes less than 1ms (was not able to measure with stopwatch, always shows 0ms). On huge string which you will never meet in practice (1MB of text) it takes 300ms on my machine.
Upvotes: 2