Reputation: 153
I'm wondering how I can replace (remove) multiple words (like 500+) from a string. I know I can use the replace function to do this for a single word, but what if I want to replace 500+ words? I'm interested in removing all generic keywords from an article (such as "and", "I", "you" etc).
Here is the code for 1 replacement.. I'm looking to do 500+..
string a = "why and you it";
string b = a.Replace("why", "");
MessageBox.Show(b);
Thanks
@ Sergey Kucher Text size will vary between a few hundred words to a few thousand. I am replacing these words from random articles.
Upvotes: 8
Views: 4743
Reputation: 111860
I would normally do something like:
// If you want the search/replace to be case sensitive, remove the
// StringComparer.OrdinalIgnoreCase
Dictionary<string, string> replaces = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
// The format is word to be searched, word that should replace it
// or String.Empty to simply remove the offending word
{ "why", "xxx" },
{ "you", "yyy" },
};
void Main()
{
string a = "why and you it and You it";
// This will search for blocks of letters and numbers (abc/abcd/ab1234)
// and pass it to the replacer
string b = Regex.Replace(a, @"\w+", Replacer);
}
string Replacer(Match m)
{
string found = m.ToString();
string replace;
// If the word found is in the dictionary then it's placed in the
// replace variable by the TryGetValue
if (!replaces.TryGetValue(found, out replace))
{
// otherwise replace the word with the same word (so do nothing)
replace = found;
}
else
{
// The word is in the dictionary. replace now contains the
// word that will substitute it.
// At this point you could add some code to maintain upper/lower
// case between the words (so that if you -> xxx then You becomes Xxx
// and YOU becomes XXX)
}
return replace;
}
As someone else wrote, but without problems with substrings (the ass
principle... You don't want to remove ass
es from class
es :-) ), and working only if you only need to remove words:
var escapedStrings = yourReplaces.Select(Regex.Escape);
string result = Regex.Replace(yourInput, @"\b(" + string.Join("|", escapedStrings) + @")\b", string.Empty);
I use the \b
word boundary... It's a little complex to explain what it's, but it's useful to find word boundaries :-)
Upvotes: 8
Reputation: 22794
Regex can do this better, you just need all the replace words in a list, and then:
var escapedStrings = yourReplaces.Select(PadAndEscape);
string result = Regex.Replace(yourInput, string.Join("|", escapedStrings);
This requires a function that space-pads the strings before escaping them:
public string PadAndEscape(string s)
{
return Regex.Escape(" " + s + " ");
}
Upvotes: 0
Reputation: 3443
depends on the situation ofcourse,
but if your text is long and you have many words,
and you want optimize performance.
you should build a trie from the words, and search the Trie for a match.
it won't lower the Order of complexity, still O(nm), but for large groups of words, it will be able to check multiple words against each char instead of one by one.
i can assume couple of houndred words should be enough to get this faster.
This is the fastest method in my opinion and
i written a function for you to start with:
public struct FindRecord
{
public int WordIndex;
public int PositionInString;
}
public static FindRecord[] FindAll(string input, string[] words)
{
LinkedList<FindRecord> result = new LinkedList<FindRecord>();
int[] matchs = new int[words.Length];
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j < words.Length; j++)
{
if (input[i] == words[j][matchs[j]])
{
matchs[j]++;
if(matchs[j] == words[j].Length)
{
FindRecord findRecord = new FindRecord {WordIndex = j, PositionInString = i - matchs[j] + 1};
result.AddLast(findRecord);
matchs[j] = 0;
}
}
else
matchs[j] = 0;
}
}
return result.ToArray();
}
Another option:
it might be the rare case where regex will be faster then building the code.
Try using
public static string ReplaceAll(string input, string[] words)
{
string wordlist = string.Join("|", words);
Regex rx = new Regex(wordlist, RegexOptions.Compiled);
return rx.Replace(input, m => "");
}
Upvotes: 0
Reputation: 11577
if you are talking about a single string the solution is to remove them all by a simple replace method. as you can read there:
"Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string".
you may be needing to replace several words, and you can make a list of these words:
List<string> wordsToRemove = new List<string>();
wordsToRemove.Add("why");
wordsToRemove.Add("how);
and so on
and then remove them from the string
foreach(string curr in wordsToRemove)
a = a.ToLower().Replace(curr, "");
Importent
if you want to keep your string as it was, without lowering words and without struggling with lower and upper case use
foreach(string curr in wordsToRemove)
// You can reuse this object
Regex regex = new Regex(curr, RegexOptions.IgnoreCase);
myString = regex.Replace(myString, "");
Upvotes: 0
Reputation: 5899
Try this:
string text = "word1 word2 you it";
List<string> words = new System.Collections.Generic.List<string>();
words.Add("word1");
words.Add("word2");
words.ForEach(w => text = text.Replace(w, ""));
Edit
If you want to replace text with another text, you can create class Word:
public class Word
{
public string SearchWord { get; set; }
public string ReplaceWord { get; set; }
}
And change above code to this:
string text = "word1 word2 you it";
List<Word> words = new System.Collections.Generic.List<Word>();
words.Add(new Word() { SearchWord = "word1", ReplaceWord = "replaced" });
words.Add(new Word() { SearchWord = "word2", ReplaceWord = "replaced" });
words.ForEach(w => text = text.Replace(w.SearchWord, w.ReplaceWord));
Upvotes: 0
Reputation: 2879
Create a list of all text you want and load it into a list, you do this fairly simple or get very complex. A trivial example would be:
var sentence = "mysentence hi";
var words = File.ReadAllText("pathtowordlist.txt").Split(Enviornment.NewLine);
foreach(word in words)
sentence.replace("word", "x");
You could create two lists if you wanted a dual mapping scheme.
Upvotes: 0