Count how many occurrences of substrings within a string without counting duplicates

Question

For example, I have a list of terms and a string:

var terms = { "programming language", "programming", "language" };

var content = "A programming language is a formal language that "
    + "specifies a set of instructions that can be used to "
    + "produce various kinds of output.";

I can use Regex.Matches(content, term).Count to count that there are 4 times the list appear in the string:

"programming language": 1 time
"programming": 1 time
"language": 2 times

But there are duplicates, there should be only 2 occurrences.

My current solution is to save the begin index and end index of each occurrence, then compare to the saved occurences wherever it is in range and has already been count. Is there a better way without using start and end indexes?

MiP · Accepted Answer

After suggestions from comments, I have a simple solution using regex, it should work with exact whole word, i.e. programming language can be counted but programming languages cannot:

var pattern = @"(?



Note: this pattern can only be used when programming language is placed before programming and language terms. If anyone can contribute a better solution, please do so.

Count how many occurrences of substrings within a string without counting duplicates

Answers (1)

Related Questions