Utsab
Utsab

Reputation: 209

Regular expression to match the string representation of non ascii characters \u0000-\u007F from a string and replace with empty string in C#?

I am getting hexadecimal representation of unicode characters in my string and want to replace that with empty string. More specifically, trying to match all values within \u0000-\u007F in a string using regex to replace it with empty string with C#.

Example 1:

InputString: "\u007FTestString"

ExpectedResult: TestString

Example 2:

InputString: "\u007FTestString\U0000"

ExpectedResult: TestString

My current solution does

            if (!string.IsNullOrWhiteSpace(testString))
            {
                return Regex.Replace(testString, @"[^\u0000-\u007F]", string.Empty);
            }

does not match the hexadecimal representation of the non-ascii character. How do i get it to match the \u0000-\u007F in the string ?

Any help is appreciated. Thank you!

Upvotes: 1

Views: 567

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

var result = Regex.Replace(@"\u007FTestString\U0000", @"\\[uU]00[0-7][0-9A-Fa-f]", "");

The @"..." verbatim string literal syntax is required to make all backslashes literal characters that do not form any string escape sequences.

Pattern details:

  • \\ - a backslash
  • [uU] - u or U
  • 00 - two zeros
  • [0-7] - a digit from zero to seven
  • [0-9A-Fa-f] - an ASCII hex digit char.

Upvotes: 1

Related Questions