Reputation: 55
I have a regex that runs fine on http://gskinner.com/RegExr/ and http://regexhero.net/tester/ which is .net. However it timeouts (1h+) in my .net v4.5 code.
(?<OuterDescription>[ \t]*--[ \t]+Description:[ \t]+(?!\<Description)(?<Description>\S[^\r\n]{1,})((\r\n|\r|\n)(?![ \t]*--[ \t]*Modified)[^\r\n]*)*)
with sample data:
-- ========================================================================================================
-- Author: A Name
-- Create date: 11/26/2012
-- Description: A description
-- A multiline description
-------------------------------------- Group Name -----------------------------------------
-- More details
-- More details
--
-- Modified: 01/7/2012 - Some reason
-- Modified: 12/7/2012 - Some other reason
-- ========================================================================================================
my code looks like this
var isMatch = new Regex(pattern, RegexOptions.None, TimeSpan.FromMinutes(1)).IsMatch(_fileText);
hoping OuterDescription captures from -- Description
to just before -- Modified
I've narrowed it down to the [^\r\n]*
near the end. I do not know how to fix this to not timeout in c#
Thanks for the discussion and answer. It helped move the timeout out of the description. Unfortunately I am still having problems. This is what I have so far
[ \t]*--[ \t]+={3,}
(\r\n|\n|\r)
(?<OuterAuthor>[ \t]*--[ \t]+
Author:[ \t]+
(?!\<Author)
(?<Author>\S[^\r\n]+))
(\r\n|\n|\r)
(?<OuterCreateDate>[ \t]*--[ \t]+
Create\ [Dd]ate:[ \t]+
(?!\<Create)
(?<CreateDate>\S[^\r\n]{1,}))
(\r\n|\n|\r)
(?<OuterDescription>[ \t]*--[ \t]+
Description:[ \t]+
(?!\<Description)
(?<Description>\S[^\r\n]+)
(?<MultilineDescription>((\r\n|\r|\n)|[^\r\n]*)*?)
(?=(
[ \t]*--[ \t]*Modified)|(
[ \t]*--[ \t]*={3,})
))
That works fine but as soon as I add something after this it will timeout.
Sorry I did not mention this first, I assumed one little greedy star change would be all my problems. To get a sense of the final picture, I have an isAdded bool that will determine whether I check for modified lines (the same way as description) then end with the header/footer. like so
var entireCommentHeaderNamedGroupsRegex = headerFooterRegex + newlineRegex
+ authorRegex + newlineRegex
+ createDateRegex + newlineRegex
+ descriptionRegex + newlineRegex
+ (_isAdded ? modifiedRegex + newlineRegex : "")
+ headerFooterRegex;
some more sample data for when it is not modified:
-- =============================================
-- Author: Garrett Carson
-- Create date: 10/4/2013
-- Description: This is a test
-- =============================================
CREATE PROCEDURE dbo.ThisIsATest
AS
BEGIN
PRINT 'This is a test'
END
Also, as mentioned in the comments, I am fairly new to regular expressions (on this scale) so excuse my terminology if this is not actually catastrophic backtracking.
As a final edit, I ended up going with a poor man's fsm
string currentState = "LookForAuthor"
foreach (var line in lines) {
switch currentState {
case "LookForAuthor" : {
... use author regex ... save to author variable ...
if(found) currentState = "LookForCreateDate"
else throw new InvalidCommentException();
}
case "LookForCreateDate": {
... use createDate regex ... save to createDate variable ...
...
}
...
}
}
if (!_isAdded && !(currentState == "Modified-FirstLine" || currentState == "Modified-MoreLines")) {
throw new InvalidCommentException();
}
I then re factored to using enums. The bite sized regexes applied line by line no longer cause timeouts.
Upvotes: 4
Views: 536
Reputation: 3225
The following seems to work for me (using RegexOptions.IgnorePatternWhitespace
for clarity):
@"(?<OuterDescription>[ \t]*--[ \t]+
Description:[ \t]+
(?!\<Description)
(?<Description> \S[^\r\n]{1,})
(?<MultilineDescription>(\r?\n|[^\r\n]*)*?)
(?=[ \t]*--[ \t]*Modified)
)";
In general, nesting greedy quantifiers can lead to the problem you're experiencing. Unfortunately I'm too tired to investigate it in depth, but if you're curious about what's going wrong, I can make a note to look into it later
Upvotes: 2