Yozer
Yozer

Reputation: 648

Replacing unicode characters in string in C#

I have string for example:

string str = "ĄĆŹ - ćwrą";

How can i replace ĄĆŹ - ćą with they shortcuts? Result for that example string should be:

str = "\u0104\u0106\u0179 \u2013 \u0107wr\u0105"

Is there any fast replacement method? I dont want to use .Replace for each character...

Upvotes: 4

Views: 6692

Answers (1)

Jon
Jon

Reputation: 437774

Converting to a JSON string like that is more cumbersome than it should be, mainly because you need to work with Unicode code points which in practice means calling char.ConvertToUtf32. In order to do that, you need to somehow handle surrogate pairs; System.Globalization.StringInfo can help with that.

Here's a function that uses these building blocks to perform the conversion:

string str = "ĄĆŹ - ćwrą";

public string ToJsonString(string s)
{
    var enumerator = StringInfo.GetTextElementEnumerator(s);
    var sb = new StringBuilder();

    while (enumerator.MoveNext())
    {
        var unicodeChar = enumerator.GetTextElement();
        var codePoint = char.ConvertToUtf32(unicodeChar, 0);
        if (codePoint < 0x80) {
            sb.Append(unicodeChar);
        }
        else if (codePoint < 0xffff) {
            sb.Append("\\u").Append(codePoint.ToString("x4"));
        }
        else {
            sb.Append("\\u").Append((codePoint & 0xffff).ToString("x4"));
            sb.Append("\\u").Append(((codePoint >> 16) & 0xffff).ToString("x4"));
        }
    }

    return sb.ToString();
}

Upvotes: 8

Related Questions