Reputation: 1022
Here's the code I'm running :
Dim descriptionMatches As MatchCollection = Regex.Matches(pageJSON, "\[\[(([\w]+[\s]*)+)\]\], (([\w]+[\s]*)+)\\n")
Console.WriteLine(descriptionMatches.Count)
Now, everything works fine until the last line. Looks like the MatchCollection.Count() method takes really long to execute, so long, I've run the program for more than 2 minutes...
Here's some additional information.
When I cut the regex pattern to just "\[\[(([\w]+[\s]*)+)\]\]"
I get 35 matches and it's seemingly instantaneous.
When I use a for loop to parse through the MatchCollection, if I use a loop in the form of for i=0 to matchcollection.count, the loop doesn't get executed (like the regex is still trying to analyse the input string. And if I use a for each (the difference is that the latest uses an iterator) I get to about the 15th match before it freezes. Weird isn't it?
Here's the a link to the string I'm trying to match, as you will see, it's not the longest string ever : Wikipedia API result for SRS
In the probable case where my pattern is the problem and you want to suggest me a new pattern, what I'm trying to match looks like this :
[[Name of item]], description of item\n
I've used regular expressions a lot in the past and this has never happened to me. If someone knows what is the problem, could you please tell me what it is and how to fix it?
Upvotes: 1
Views: 91
Reputation: 46375
You want to match two [[
followed by something followed by two ]]
. Make it simple for yourself:
\[\[([^][]+)\]\], (.*?)\\n\*
See it at work at http://regex101.com/r/kK5rO4
Explanation:
\[\[ find two literal [[ in a row
([^][]+) match at least one character that is not ] or [ (note - the order matters)
and "save" that match (so you can pull it out later)
\]\] all the fun stops when you hit two closing brackets
(but since the match already said "no closing brackets" there is no backtracking)
, match comma followed by space
(.*?) match the least amount you can until you get to…
\\n\* literal \n* (both the \ and the * need a backslash to escape them
You need a g
flag for a conventional regex to match "all instances" but I think that's taken care of by the rest of your code, effectively.
Upvotes: 4
Reputation: 41991
Your regular expression leads to "catastrophic backtracking", making it too complex.
Consider rewriting your regex to be more possessive.
Upvotes: 3