Reputation: 5621
Given
var stringList = new List<string>(new string[] {
"outage","restoration","efficiency"});
var queryText = "While walking through the park one day, I noticed an outage",
"in the lightbulb at the plant. I talked to an officer about",
"restoration protocol for public works, and he said to contact",
"the department of public works, but not to expect much because",
"they have low efficiency."
How do I get the overall number of occurances of all strings in stringList from queryText?
In the above example, I would want a method that returned 3;
private int stringMatches (string textToQuery, string[] stringsToFind)
{
//
}
RESULTS
SPOKE TOO SOON!
Ran a couple of performance tests, and this branch of code from Fabian was faster by a good margin:
private int stringMatches(string textToQuery, string[] stringsToFind)
{
int count = 0;
foreach (var stringToFind in stringsToFind)
{
int currentIndex = 0;
while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
{
currentIndex++;
count++;
}
}
return count;
}
Execution Time: On a 10000 iteration loop using stopwatch:
Fabian: 37-42 milliseconds
lazyberezovsky StringCompare: 400-500 milliseconds
lazyberezovsky Regex: 630-680 milliseconds
Glenn: 750-800 milliseconds
(Added StringComparison.Ordinal to Fabians answer for additional speed.)
Upvotes: 5
Views: 2680
Reputation: 2806
This is a revision of Fabian Bigler's original answer. It is about a 33% speed improvement mostly because of StringComparison.Ordinal.
Here's a link for more info on this: http://msdn.microsoft.com/en-us/library/bb385972.aspx
private int stringMatches(string textToQuery, List<string> stringsToFind)
{
int count = 0, stringCount = stringsToFind.Count(), currentIndex;
string stringToFind;
for (int i = 0; i < stringCount; i++)
{
currentIndex = 0;
stringToFind = stringsToFind[i];
while ((currentIndex = textToQuery.IndexOf(stringToFind, currentIndex, StringComparison.Ordinal)) != -1)
{
currentIndex++;
count++;
}
}
return count;
}
Upvotes: 0
Reputation: 142
private int stringMatches(string textToQuery, string[] stringsToFind)
{
string[] splitArray = textToQuery.Split(new char[] { ' ', ',','.' });
var count = splitArray.Where(p => stringsToFind.Contains(p)).ToArray().Count();
return count;
}
Upvotes: 0
Reputation: 10915
This will match only the words of your TextToQuery:
The idea of this is to check if the index before and after the match is not a letter. Also, I had to make sure to check if it's the start or end of the string.
private int stringMatchesWordsOnly(string textToQuery, string[] wordsToFind)
{
int count = 0;
foreach (var wordToFind in wordsToFind)
{
int currentIndex = 0;
while ((currentIndex = textToQuery.IndexOf(wordToFind, currentIndex, StringComparison.Ordinal)) != -1)
{
if (((currentIndex == 0) || //is it the first index?
(!Char.IsLetter(textToQuery, currentIndex - 1))) &&
((currentIndex == (currentIndex + wordToFind.Length)) || //has the end been reached?
(!Char.IsLetter(textToQuery, currentIndex + wordToFind.Length))))
{
count++;
}
currentIndex++;
}
}
return count;
}
Conclusion: As you can see this approach is a bit messier than my other answer and will be less performant (Still more performant than the other answers, though). So it really depends on what you want to achieve. If you have short words in your strings to find, you should probably take this answer, because e.g. an 'and' would obviously return too many matches with the first approach.
Upvotes: 0
Reputation: 10915
That might also be fast:
private int stringMatches(string textToQuery, string[] stringsToFind)
{
int count = 0;
foreach (var stringToFind in stringsToFind)
{
int currentIndex = 0;
while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
{
currentIndex++;
count++;
}
}
return count;
}
Upvotes: 6
Reputation: 236318
This LINQ query splits text by spaces and punctuation symbols, and searches matches ignoring case
private int stringMatches(string textToQuery, string[] stringsToFind)
{
StringComparer comparer = StringComparer.CurrentCultureIgnoreCase;
return textToQuery.Split(new []{' ', '.', ',', '!', '?'}) // add more if need
.Count(w => stringsToFind.Contains(w, comparer));
}
Or with regular expression:
private static int stringMatches(string textToQuery, string[] stringsToFind)
{
var pattern = String.Join("|", stringsToFind.Select(s => @"\b" + s + @"\b"));
return Regex.Matches(textToQuery, pattern, RegexOptions.IgnoreCase).Count;
}
Upvotes: 4
Reputation: 164
I like Tim's answer, but I try to avoid making too many strings to avoid performance issues, and I do like regular expressions, so here's another way to go:
private int StringMatches(string searchMe, string[] keys)
{
System.Text.RegularExpressions.Regex expression = new System.Text.RegularExpressions.Regex(string.Join("|", keys), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
return expression.Matches(searchMe).Count;
}
Upvotes: 1
Reputation: 460268
If you want to count the words in the string that are in the other collection:
private int stringMatches(string textToQuery, string[] stringsToFind)
{
return textToQuery.Split().Intersect(stringsToFind).Count();
}
Upvotes: 3