Reputation: 11791
Say we have a string like below.
string s = "此检查项己被你忽略,请联系医生。\u2028内科";
How can I remove the unicode character like \u2028
in the string ?
I had tried the below function. Unfortunately they all doesn't work. Please save me. Thanks.
Convert a Unicode string to an escaped ASCII string
Replace unicode escape sequences in a string
Updated
Why the below code doesn't work for me ?
Updated I tried to display the string in the output. It is a line separator.
Upvotes: 2
Views: 3983
Reputation: 19641
As noted by @spender in the comments above:
The fundamental premise of your question (removing unicode) is broken, because all strings are stored as unicode in memory. All the characters are unicode.
However, if you have a non-escaped string in the format "\uXXXX"
which you'd like to replace/remove, you can use something like this regex pattern: @"\\u[0-9A-Fa-f]{4}"
Here's a complete example:
string noUnicode = "此检查项己被你忽略,请联系医生。内科";
// If you hard-code the string, you MUST add an `@` before the string, otherwise,
// the "u2028" will get escaped and converted to its corresponding Unicode character.
string s = @"此检查项己被你忽略,请联系医生。\u2028内科";
string ss = Regex.Replace(s, @"\\u[0-9A-Fa-f]{4}", string.Empty);
Debug.Print("s = " + s);
Debug.Print("ss = " + ss);
Debug.Print((ss == noUnicode).ToString());
Here's a fiddle to test, and here's its output:
Note: Since the string is hard-coded, you have to use an @
here to prevent the sub-string "\u2028"
from being converted to the corresponding Unicode char. On the other hand, if you get the original string from somewhere else (e.g., read from a text file), the sub-string "\u2028"
is already represented as is, there should be no problem, and the above code should work just fine.
So, something like this would work exactly the same:
string s = File.ReadAllText(@"Path\to\a\Unicode\text\file\containing\the\string\'\u2028'");
string ss = Regex.Replace(s, @"\\u[0-9A-Fa-f]{4}", string.Empty);
Upvotes: 2