DoctorAV
DoctorAV

Reputation: 1189

Get Removed characters from string

I am using Regex to remove unwanted characters from string like below:

str = System.Text.RegularExpressions.Regex.Replace(str, @"[^\u0020-\u007E]", "");

How can I retrieve distinct characters which will be removed in efficient way?

EDIT:

Sample input  : str         = "This☺ contains Åüsome æspecialæ characters"
Sample output : str         = "This contains some special characters"
                removedchar = "☺,Å,ü,æ"

Upvotes: 1

Views: 77

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator:

evaluator
           Type: System.Text.RegularExpressions.MatchEvaluator
            A custom method that examines each match and returns either the original matched string or a replacement string.

C# demo:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Test
{
    public static List<string> characters = new List<string>();
    public static void Main()
    {
        var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
        Console.WriteLine(str); // => My string 123
        Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
    }

    public static string Repl(Match m)
    {
        characters.Add(m.Value);
        return string.Empty;
    }
}

See IDEONE demo

In short, declare a "global" variable (a list of strings, here, characters), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

Upvotes: 1

David Pilkington
David Pilkington

Reputation: 13620

string pattern = @"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();

foreach (Match match in rgx.Matches(str))
{
    if (!matches.Contains (match.Value))
    {
        matches.Add (match.Value);
    }
}

Upvotes: 2

Related Questions