Wesley
Wesley

Reputation: 5621

Occurrences of a List<string> in a string C#

Given

var stringList = new List<string>(new string[] {
                   "outage","restoration","efficiency"});

var queryText = "While walking through the park one day, I noticed an outage",
              "in the lightbulb at the plant. I talked to an officer about", 
              "restoration protocol for public works, and he said to contact",
              "the department of public works, but not to expect much because",
              "they have low efficiency."

How do I get the overall number of occurances of all strings in stringList from queryText?

In the above example, I would want a method that returned 3;

private int stringMatches (string textToQuery, string[] stringsToFind)
{
    //
}

RESULTS

SPOKE TOO SOON!

Ran a couple of performance tests, and this branch of code from Fabian was faster by a good margin:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    int count = 0;
    foreach (var stringToFind in stringsToFind)
    {
        int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
       currentIndex++;
       count++;
    }
    }
    return count;
}

Execution Time: On a 10000 iteration loop using stopwatch:

Fabian: 37-42 milliseconds

lazyberezovsky StringCompare: 400-500 milliseconds

lazyberezovsky Regex: 630-680 milliseconds

Glenn: 750-800 milliseconds

(Added StringComparison.Ordinal to Fabians answer for additional speed.)

Upvotes: 5

Views: 2680

Answers (7)

Chris
Chris

Reputation: 2806

This is a revision of Fabian Bigler's original answer. It is about a 33% speed improvement mostly because of StringComparison.Ordinal.

Here's a link for more info on this: http://msdn.microsoft.com/en-us/library/bb385972.aspx

    private int stringMatches(string textToQuery, List<string> stringsToFind)
    {
        int count = 0, stringCount = stringsToFind.Count(), currentIndex;
        string stringToFind;
        for (int i = 0; i < stringCount; i++)
        {
            currentIndex = 0;
            stringToFind = stringsToFind[i];
            while ((currentIndex = textToQuery.IndexOf(stringToFind, currentIndex, StringComparison.Ordinal)) != -1)
            {
                currentIndex++;
                count++;
            }
        }
        return count;
    }

Upvotes: 0

Dejan Ciev
Dejan Ciev

Reputation: 142

private int stringMatches(string textToQuery, string[] stringsToFind)
{
      string[] splitArray = textToQuery.Split(new char[] { ' ', ',','.' });
      var count = splitArray.Where(p => stringsToFind.Contains(p)).ToArray().Count();
      return count;
}

Upvotes: 0

Fabian Bigler
Fabian Bigler

Reputation: 10915

This will match only the words of your TextToQuery:

The idea of this is to check if the index before and after the match is not a letter. Also, I had to make sure to check if it's the start or end of the string.

  private int stringMatchesWordsOnly(string textToQuery, string[] wordsToFind)
        {
            int count = 0;
            foreach (var wordToFind in wordsToFind)
            {
                int currentIndex = 0;
                while ((currentIndex = textToQuery.IndexOf(wordToFind, currentIndex,         StringComparison.Ordinal)) != -1)
                {
                    if (((currentIndex == 0) || //is it the first index?
                          (!Char.IsLetter(textToQuery, currentIndex - 1))) &&
                          ((currentIndex == (currentIndex + wordToFind.Length)) || //has the end been reached?
                          (!Char.IsLetter(textToQuery, currentIndex + wordToFind.Length))))
                    {
                        count++;
                    }
                    currentIndex++;
                }
            }
            return count;
        }

Conclusion: As you can see this approach is a bit messier than my other answer and will be less performant (Still more performant than the other answers, though). So it really depends on what you want to achieve. If you have short words in your strings to find, you should probably take this answer, because e.g. an 'and' would obviously return too many matches with the first approach.

Upvotes: 0

Fabian Bigler
Fabian Bigler

Reputation: 10915

That might also be fast:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
  int count = 0;
  foreach (var stringToFind in stringsToFind)
  {
    int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
     currentIndex++;
     count++;
    }
  }
  return count;
}

Upvotes: 6

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236318

This LINQ query splits text by spaces and punctuation symbols, and searches matches ignoring case

private int stringMatches(string textToQuery, string[] stringsToFind)
{
   StringComparer comparer = StringComparer.CurrentCultureIgnoreCase;
   return textToQuery.Split(new []{' ', '.', ',', '!', '?'}) // add more if need
                     .Count(w => stringsToFind.Contains(w, comparer));
}

Or with regular expression:

private static int stringMatches(string textToQuery, string[] stringsToFind)
{
    var pattern = String.Join("|", stringsToFind.Select(s => @"\b" + s + @"\b"));
    return Regex.Matches(textToQuery, pattern, RegexOptions.IgnoreCase).Count;
}

Upvotes: 4

Glenn Cuevas
Glenn Cuevas

Reputation: 164

I like Tim's answer, but I try to avoid making too many strings to avoid performance issues, and I do like regular expressions, so here's another way to go:

private int StringMatches(string searchMe, string[] keys)
{
    System.Text.RegularExpressions.Regex expression = new System.Text.RegularExpressions.Regex(string.Join("|", keys), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
    return expression.Matches(searchMe).Count;
}

Upvotes: 1

Tim Schmelter
Tim Schmelter

Reputation: 460268

If you want to count the words in the string that are in the other collection:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    return textToQuery.Split().Intersect(stringsToFind).Count();
}

Upvotes: 3

Related Questions