ElHaix
ElHaix

Reputation: 12986

Keyword proximity matching - options?

I have a case where I have an array of keywords. I want to find their matches within a given string and return x number of words before and after each.

I could write a looping engine that goes through an array of each, returning a given index, and performing concatenated sub-strings based on those loops, but this seems a bit lengthy.

I've heard of Lucene, but not sure if implementing an entire framework to do this is worth it. Also, if possible, how can I accomplish with Lucene?

Thanks.

Upvotes: 0

Views: 569

Answers (1)

agent-j
agent-j

Reputation: 27913

Perhaps regular expressions would help... This builds a list of matching strings (up to 3 words before) keyword (up to 3 words after)

Edit: I missed a couple 0s and some @s. Try again.

private static void GetMatches (string s)
{
   string[] keywords = {"if", "while", "do"};
   int x = 3; // words before and after
   string ex =
      @"(\w+\W+){0," + x + @"}\b(" + string.Join("|", keywords) + @")\b\W+(\w+\W+){0," + x + @"}";
   Regex regex = new Regex(ex);
   List<string> matches = new List<string>();
   foreach (Match match in regex.Matches (s))
   {
      matches.Add(match.Value);
   }
}

Upvotes: 2

Related Questions