Rafael Herscovici
Rafael Herscovici

Reputation: 17094

Replace multiple words in a string from a list of words

i have a list of words:

string[] BAD_WORDS = { "xxx", "o2o" } // My list is actually a lot bigger about 100 words

and i have some text (usually short , max 250 words), which i need to REMOVE all the BAD_WORDS in it.

i have tried this:

    foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

but, if the text starts or ends with a bad word, it will not be removed. i did it with the spaces, so it will not match partial words for example "oxxx" should not be removed, since it is not an exact match to the BAD WORDS.

anyone can give me advise on this?

Upvotes: 7

Views: 11828

Answers (7)

James Ellis-Jones
James Ellis-Jones

Reputation: 3092

This is a great task for Linq, and also the Split method. Try this:

return string.Join(" ", input.Split(' ').Where(w => !BAD_WORDS.Contains(w)));

Upvotes: 6

shannon
shannon

Reputation: 8774

string cleaned = Regex.Replace(input, "\\b" + string.Join("\\b|\\b",BAD_WORDS) + "\\b", "")

Upvotes: 18

Yanga
Yanga

Reputation: 3012

According to the following post the fastest way is to use Regex and MatchEvaluator : Replacing multiple characters in a string, the fastest way?

        Regex reg = new Regex(@"(o2o|xxx)");
        MatchEvaluator eval = match =>
        {
            switch (match.Value)
            {
                case "o2o": return " ";
                case "xxx": return " ";
                default: throw new Exception("Unexpected match!");
            }
        };
        input = reg.Replace(input, eval);

Upvotes: 0

David Limkys
David Limkys

Reputation: 5123

Just wanted to point out that you shoulde have done with just whiole inside your for like so:

   foreach (var word in BAD_WORDS)
{
    while (input.Contains(String.Format(" {0} ", word);))
    {
        input = input.Replace(w, " ");
    }
}

No need for that if and 'w' variable, in any case i wouldehave used the answer above me that Antonio Bakula, first think that came to mind was this.

Upvotes: 0

Antonio Bakula
Antonio Bakula

Reputation: 20693

You can store words from text to one list. Then just check all words if they are in bad list, something like this :

List<string> myWords = input.Split(' ').ToList();
List<string> badWords = GetBadWords();

myWords.RemoveAll(word => badWords.Contains(word));
string Result = string.Join(" ", myWords);

Upvotes: 1

Jeremy Thompson
Jeremy Thompson

Reputation: 65544

Put the fake space's before and after the string varaible input. That way it will detect the first and last words.

input = " " + input + " ";

 foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

Then trim the string:

input = input.Trim();

Upvotes: 1

Kundan Singh Chouhan
Kundan Singh Chouhan

Reputation: 14282

You could use StartWith and EndsWith methods like:

while (input.Contains(w) || input.StartsWith(w) || input.EndsWith(w) || input.IndexOf(w) > 0)
{
   input = input.Replace(w, " ");
}

Hope this will fix your problem.

Upvotes: 1

Related Questions