VAAA
VAAA

Reputation: 15049

c# Convert string with Emoji to unicode

Im getting a string from client side like this:

This is a face  :grin: 

And i need to convert the :grin: to unicode in order to send it to other service.

Any clue how to do that?

Upvotes: 3

Views: 4628

Answers (1)

Evk
Evk

Reputation: 101623

Here is a link to a quite good json file with relevant information. It contains huge array (about 1500 entries) with emojis, and we are interested in 2 properties: "short_name" which represents name like "grin", and "unified" property, which contains unicode representation like "1F601".

I built a helper class to replace short names like ":grin:" with their unicode equivalent:

public static class EmojiParser {
    static readonly Dictionary<string, string> _colonedEmojis;
    static readonly Regex _colonedRegex;
    static EmojiParser() {
        // load mentioned json from somewhere
        var data = JArray.Parse(File.ReadAllText(@"C:\path\to\emoji.json"));
        _colonedEmojis = data.OfType<JObject>().ToDictionary(
            // key dictionary by coloned short names
            c => ":" + ((JValue)c["short_name"]).Value.ToString() + ":",
            c => {
                var unicodeRaw = ((JValue)c["unified"]).Value.ToString();
                var chars = new List<char>();
                // some characters are multibyte in UTF32, split them
                foreach (var point in unicodeRaw.Split('-'))
                {
                    // parse hex to 32-bit unsigned integer (UTF32)
                    uint unicodeInt = uint.Parse(point, System.Globalization.NumberStyles.HexNumber);
                    // convert to bytes and get chars with UTF32 encoding
                    chars.AddRange(Encoding.UTF32.GetChars(BitConverter.GetBytes(unicodeInt)));
                }
                // this is resulting emoji
                return new string(chars.ToArray());
            });
        // build huge regex (all 1500 emojies combined) by join all names with OR ("|")
        _colonedRegex =  new Regex(String.Join("|", _colonedEmojis.Keys.Select(Regex.Escape)));
    }

    public static string ReplaceColonNames(string input) {
        // replace match using dictoinary
        return _colonedRegex.Replace(input, match => _colonedEmojis[match.Value]);
    }
}

Usage is obvious:

var target = "This is a face&nbsp;&nbsp;:grin:&nbsp;:hash:";
target = EmojiParser.ReplaceColonNames(target);

It's quite fast (except first run, because of static constructor initialization). On your string it takes less than 1ms (was not able to measure with stopwatch, always shows 0ms). On huge string which you will never meet in practice (1MB of text) it takes 300ms on my machine.

Upvotes: 2

Related Questions