Aaginor
Aaginor

Reputation: 4782

Find any literal with a Regular Expression

in my C# program, I have a regular expression textparser, that finds all occurrences of words that are surrounded by double squared brackets. For instance, [[anything]] would find the word anything.

In a second step, I want to count how often the found word (in my example: anything) appears in the whole text. To do this, I try to create a RE that contains the found word and count, how many matches I get. Problem is, that the found word can also contain special chars and the following regex:

string foundWord = "(anything";
Regex countOccurences = new Regex(foundWord);

will obviously fail when the variable contains special chars like '('. Expresso suggests for matching whole expressions the following construct:

Regex countOccurences = new Regex("(?(" + foundWord + ")Yes|No)");

but when in this scenario foundWord is a number, like '2009', the RE tries to interpret it as a reference to a group (which is obviously not defined). In my text, there can be any combination of normal chars, special chars, numbers etc.

How can I tell the RE to interpret the given string as literal expression only?

Thanks in advance, Frank

Upvotes: 0

Views: 705

Answers (2)

Brienne Schroth
Brienne Schroth

Reputation: 2457

if you're just trying to count the number of occurences of a string, why use a regex at all? Just use your basic string libraries, contains(), indexOf(), whatever makes most sense in C#. But if you don't need the fancy functionality of a regex, why use a regex? I think

int position = string.indexOf(foundString);
while(position != -1)
{
    count++;
    position = string.indexOf(foundString, position + 1);
}

would accomplish it without regexes.

Upvotes: 1

Eddie Sullivan
Eddie Sullivan

Reputation: 786

You should escape the literal before building the regular expression with it, using Regex.Escape

Something like:

Regex countOccurances = new Regex(Regex.Escape(foundWord));

However, since all you're doing is counting occurances, a better option is to avoid using a regular expression for the second search at all. Since you don't care about any special characters, it would be easier just to do a plain text search.

Upvotes: 6

Related Questions