Sam Holder
Sam Holder

Reputation: 32936

Why does this regular expression pattern cause the parser to hang given certain input?

I have a regex which parses a TNS names file. However it hangs on certain TNSNames files. The problem has been tracked down to whether the string being matched has a space after the HOST= part or not. Ignoring the appropriateness of the pattern, and how to fix the issue (this has been dealt with) what I want to know is why does the change in input cause the application to hang, as the Regex.Match(invalid) call never returns

string valid = "SOMENAME = (DESCRIPTION= " + 
                "(ADDRESS= (PROTOCOL=TCP) (HOST = localhost) (PORT=1521) ) " + 
                "(CONNECT_DATA= (SERVICE_NAME=ABC)))";

string invalid = "SOMENAME = (DESCRIPTION= " + 
                "(ADDRESS= (PROTOCOL=TCP) (HOST =localhost) (PORT=1521) ) " + 
                "(CONNECT_DATA= (SERVICE_NAME=ABC)))";
Regex regex = new Regex("SOMENAME" + @"[^=]*=(\s|[^H]*)*HOST\s*=\s(?<host>[^\)]*)\s*\)", RegexOptions.Multiline | RegexOptions.IgnoreCase);
//this line is fine
Match match = regex.Match(valid);  
//this line causes visual studio to hang
match = regex.Match(invalid);

Upvotes: 2

Views: 262

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336138

This is most certainly caused by catastrophic backtracking, and the culprit is

(\s|[^H]*)*

because \s and [^H] can match the same content, and because you've nested two infinite quantifiers.

[^H]* alone matches exactly the same content and is not prone to backtracking, so try this:

Regex regex = new Regex("SOMENAME" + @"[^=]*=([^H]*)HOST\s*=\s(?<host>[^\)]*)\s*\)", RegexOptions.Multiline | RegexOptions.IgnoreCase);

Upvotes: 4

Related Questions