Aleks
Aleks

Reputation: 63

Regex requires too many steps

Could you please correct my regex?

I need to match all <img> tags which have a ?contextId inside of src. For instance the following string should be matched:

<img xmlns="http://www.w3.org/1999/xhtml" src="http://10.3.34.34:8080/Bilder/pic.png?contextId=qualifier123" alt="Bild" />

I wrote the regular expression and it does what I need:

(?i)<img[^>]+? src\s*?=\s*?"(.*?\?contextId.*?)"[^\/]+?\/>

But it seems to me it takes too many steps (380 here) to parse: regex demo

Input string can be up to 30,000 characters and I worry that Java regex engine may fail with my non-optimized expression.

Upvotes: 0

Views: 648

Answers (2)

Quinn
Quinn

Reputation: 4504

I made some changes to your regex:

<img.*?src\s*=\s*"([^"]*\?contextId[^"]*)

1)   *? to [^"]*    # match non "(double quotes) characters instead of .(dot)
2)  "[^\/]+?\/>     # no need to match this part

REGEX 101 DEMO

Upvotes: 1

dron22
dron22

Reputation: 1233

98 steps (regex demo):

<img.*?src="[^"]+\?contextId[^>]+>

This regex makes the assumption that the html is not malformed and particularly expects that each img tag has a src attribute.

EDIT: 104 steps to take both the img and the src link (regex demo):

(<img.*?src="([^"]+\?contextId[^"]+)"[^>]+>)

Upvotes: 1

Related Questions