Reputation: 6432
For example, I have a list of terms and a string:
var terms = { "programming language", "programming", "language" };
var content = "A programming language is a formal language that "
+ "specifies a set of instructions that can be used to "
+ "produce various kinds of output.";
I can use Regex.Matches(content, term).Count
to count that there are 4 times the list appear in the string:
But there are duplicates, there should be only 2 occurrences.
My current solution is to save the begin index and end index of each occurrence, then compare to the saved occurences wherever it is in range and has already been count. Is there a better way without using start and end indexes?
Upvotes: 0
Views: 259
Reputation: 6432
After suggestions from comments, I have a simple solution using regex, it should work with exact whole word, i.e. programming language
can be counted but programming languages
cannot:
var pattern = @"(?<!\S)programming language(?![^\s])|(?<!\S)programming(?![^\s])|(?<!\S)language(?![^\s])";
var count = Regex.Matches(content, pattern).Count;
Note: this pattern can only be used when programming language
is placed before programming
and language
terms. If anyone can contribute a better solution, please do so.
Upvotes: 1