user1926567
user1926567

Reputation: 153

How to Replace Multiple Words in a String Using C#?

I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).

Here is the code for 1 replacement.. I'm looking to do 500+..

        string a = "why and you it";
        string b = a.Replace("why", "");
        MessageBox.Show(b);

Thanks

@ Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.

Upvotes: 8

Views: 4743

Answers (6)

xanatos
xanatos

Reputation: 111860

I would normally do something like:

// If you want the search/replace to be case sensitive, remove the 
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) { 
    // The format is word to be searched, word that should replace it
    // or String.Empty to simply remove the offending word
    { "why", "xxx" }, 
    { "you", "yyy" },
};

void Main()
{
    string a = "why and you it and You it";

    // This will search for blocks of letters and numbers (abc/abcd/ab1234)
    // and pass it to the replacer
    string b = Regex.Replace(a, @"\w+", Replacer);
}

string Replacer(Match m)
{
    string found = m.ToString();

    string replace;

    // If the word found is in the dictionary then it's placed in the 
    // replace variable by the TryGetValue
    if (!replaces.TryGetValue(found, out replace))
    {
        // otherwise replace the word with the same word (so do nothing)
        replace = found;
    }
    else
    {
        // The word is in the dictionary. replace now contains the
        // word that will substitute it.

        // At this point you could add some code to maintain upper/lower 
        // case between the words (so that if you -> xxx then You becomes Xxx
        // and YOU becomes XXX)
    }

    return replace;
}

As someone else wrote, but without problems with substrings (the ass principle... You don't want to remove asses from classes :-) ), and working only if you only need to remove words:

var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, @"\b(" + string.Join("|", escapedStrings) + @")\b", string.Empty);

I use the \b word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)

Upvotes: 8

It&#39;sNotALie.
It&#39;sNotALie.

Reputation: 22794

Regex can do this better, you just need all the replace words in a list, and then:

var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);

This requires a function that space-pads the strings before escaping them:

public string PadAndEscape(string s)
{
    return Regex.Escape(" " + s + " ");
}

Upvotes: 0

Tomer W
Tomer W

Reputation: 3443

depends on the situation ofcourse,
but if your text is long and you have many words,
and you want optimize performance.

you should build a trie from the words, and search the Trie for a match.

it won't lower the Order of complexity, still O(nm), but for large groups of words, it will be able to check multiple words against each char instead of one by one.
i can assume couple of houndred words should be enough to get this faster.

This is the fastest method in my opinion and
i written a function for you to start with:

public struct FindRecord
    {
        public int WordIndex;
        public int PositionInString;
    }

    public static FindRecord[] FindAll(string input, string[] words)
    {
        LinkedList<FindRecord> result = new LinkedList<FindRecord>();
        int[] matchs = new int[words.Length];

        for (int i = 0; i < input.Length; i++)
        {
            for (int j = 0; j < words.Length; j++)
            {
                if (input[i] == words[j][matchs[j]])
                {
                    matchs[j]++;
                    if(matchs[j] == words[j].Length)
                    {
                        FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
                        result.AddLast(findRecord);
                        matchs[j] = 0;
                    }

                }
                else
                    matchs[j] = 0;
            }
        }
        return result.ToArray();
    }

Another option:
it might be the rare case where regex will be faster then building the code.

Try using

public static string ReplaceAll(string input, string[] words)
    {
        string wordlist = string.Join("|", words);
        Regex rx = new Regex(wordlist, RegexOptions.Compiled);
        return rx.Replace(input, m => "");
    }

Upvotes: 0

No Idea For Name
No Idea For Name

Reputation: 11577

if you are talking about a single string the solution is to remove them all by a simple replace method. as you can read there:

"Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string".

you may be needing to replace several words, and you can make a list of these words:

List<string> wordsToRemove = new List<string>();
wordsToRemove.Add("why");
wordsToRemove.Add("how);

and so on

and then remove them from the string

foreach(string curr in wordsToRemove)
   a = a.ToLower().Replace(curr, "");

Importent

if you want to keep your string as it was, without lowering words and without struggling with lower and upper case use

foreach(string curr in wordsToRemove)
   // You can reuse this object
   Regex regex = new Regex(curr, RegexOptions.IgnoreCase);
   myString = regex.Replace(myString, "");

Upvotes: 0

Vano Maisuradze
Vano Maisuradze

Reputation: 5899

Try this:

string text = "word1 word2 you it";
List<string> words = new System.Collections.Generic.List<string>();
words.Add("word1");
words.Add("word2");
words.ForEach(w => text = text.Replace(w, ""));

Edit

If you want to replace text with another text, you can create class Word:

 public class Word
 {
     public string SearchWord { get; set; }
     public string ReplaceWord { get; set; }
 }

And change above code to this:

string text = "word1 word2 you it";
List<Word> words = new System.Collections.Generic.List<Word>();
words.Add(new Word() { SearchWord = "word1", ReplaceWord = "replaced" });
words.Add(new Word() { SearchWord = "word2", ReplaceWord = "replaced" });
words.ForEach(w => text = text.Replace(w.SearchWord, w.ReplaceWord));

Upvotes: 0

Vaughan Hilts
Vaughan Hilts

Reputation: 2879

Create a list of all text you want and load it into a list, you do this fairly simple or get very complex. A trivial example would be:

var sentence = "mysentence hi";
var words = File.ReadAllText("pathtowordlist.txt").Split(Enviornment.NewLine);
foreach(word in words)
   sentence.replace("word", "x");

You could create two lists if you wanted a dual mapping scheme.

Upvotes: 0

Related Questions