Simon Corcos
Simon Corcos

Reputation: 1022

Regular expression takes unsual amount of time?

Here's the code I'm running :

Dim descriptionMatches As MatchCollection = Regex.Matches(pageJSON, "\[\[(([\w]+[\s]*)+)\]\], (([\w]+[\s]*)+)\\n")
Console.WriteLine(descriptionMatches.Count)

Now, everything works fine until the last line. Looks like the MatchCollection.Count() method takes really long to execute, so long, I've run the program for more than 2 minutes...

Here's some additional information.

[[Name of item]], description of item\n

I've used regular expressions a lot in the past and this has never happened to me. If someone knows what is the problem, could you please tell me what it is and how to fix it?

Upvotes: 1

Views: 91

Answers (2)

Floris
Floris

Reputation: 46375

You want to match two [[ followed by something followed by two ]]. Make it simple for yourself:

\[\[([^][]+)\]\], (.*?)\\n\*

See it at work at http://regex101.com/r/kK5rO4

Explanation:

\[\[       find two literal [[ in a row
([^][]+)   match at least one character that is not ] or [ (note - the order matters)
           and "save" that match (so you can pull it out later)
\]\]       all the fun stops when you hit two closing brackets
           (but since the match already said "no closing brackets" there is no backtracking)
,          match comma followed by space
(.*?)      match the least amount you can until you get to…

\\n\*      literal \n* (both the \ and the * need a backslash to escape them

You need a g flag for a conventional regex to match "all instances" but I think that's taken care of by the rest of your code, effectively.

Upvotes: 4

Dean Taylor
Dean Taylor

Reputation: 41991

Your regular expression leads to "catastrophic backtracking", making it too complex.

Consider rewriting your regex to be more possessive.

Upvotes: 3

Related Questions