Reputation: 21
I need help for optimizing my regex for processing URL BBCode Tag. The regex is to check that URL tag has valid pattern and NOT containing whitelist protocol
#(\[url=(?:"|"|\'|)(((((?!https|http|ftp|mailto).)*):(//)?)([^\[\]]*))(?:"|"|\'|)\])(.*)(\[/url\])#siU
Regex will ignore :
And match when :
It's run well and has no issue, until user create string data with more than 10000 char length, that will make Catastrophic backtracking
Upvotes: 2
Views: 191
Reputation: 18980
Here is a slightly optimized version:
(?:\[url=(?:"|"|\'|)(?:(?:(?:(?:(?!https?|ftp|mailto).)*):(?://)?)(?:(?!"|"|"e;).)++)(?:"|"|\'|)\])(?:(?!\[/url\]).)++(?:\[/url\])
The main optimizations here are:
(?:)
(?:(?!).)
++
If you are going to use this pattern often it might be worth to mention the S
|Study PHP regex flag. Guessing from the description, it should not be useful but might be still worth the trial. I have not tested it.
Regarding your updated sample: It's probably best to do this in a two step process: first, extract the URL meta tags with a much simpler regex, e.g.
then, use your original regex or the one above to verify the input format.
Upvotes: 1