Elroy Flynn
Elroy Flynn

Reputation: 3218

Why is this regex slow? Can it be faster?

re = new Regex ((.*?)someliteraltext(.*?moreliteral), RegexOptions.Singleline);
re.Match(c);

Note that Singleline is used, so that "." matches newline.

I run this on a chunk of text that is about 100k characters and it runs for minutes.

Can it be faster?

Upvotes: 0

Views: 505

Answers (2)

MRAB
MRAB

Reputation: 20664

It's slow because it requires a lot of backtracking. The article here:

http://www.regular-expressions.info/engine.html

might give you some idea of just how much work it's doing.

As @Wrikken suggested, by removing the initial "(.*?)". This capture group will capture everything from the start of the string until "someliteraltext".

Alternatively, use "IndexOf" to find "someliteraltext" and then "moreliteral" after it. That should be faster.

Upvotes: 2

Joel Rondeau
Joel Rondeau

Reputation: 7586

I agree with the comments that the thing that's most likely slowing it down the most is that starting (.*?). If you have 1000 characters in front of the first "someliteraltext", that's already 1001 matches of that portion of the regex. @CodeInChaos' suggestion of prefixing with ^ (beginning of string) is a quick way to limit those matches. If that isn't acceptable, you'll need to explain more of what you're trying to do with the matches to get a better answer.

Upvotes: 3

Related Questions