Reputation: 3
I know RegEx is NOT the most ideal tool for searching within HTML. However, it's what I'm given to work with. Note: I'm not looking for something that will be robust across websites. For example, I'm just considering quotation marks, and I'm not worried about apostrophe characters.
Suppose I have the following text:
The quick brown "fox.jpg" jumps "google.com" over the "lazy.png" dog.
I'm wanting to search for specific Image links, matching "fox.jpg" and "lazy.png", ignoring "google.com". I could theoretically use a search pattern like
".*?"
that would find all quotes, from which I could simply parse each match to determine whether or not it's an image.
But something like
".*?(jpg|png)"
doesn't work because it returns "fox.jpg" (good) and "google.com" over the "lazy.png" (bad).
So: is there an extra "greedy" setting that I'm missing? Something to tell RegEx that the first quotation mark of the match should be the quotation mark closest to the last quotation mark?
Upvotes: 0
Views: 143
Reputation: 370699
After the first "
, try repeating anything but a "
, via a negated character set, instead of .
, which will (undesirably) match a "
:
"[^"]*(jpg|png)"
https://regex101.com/r/PKZLp5/1
Doesn't matter whether the repetition is lazy or greedy now, though when the filename is longer than the file extension, greedy repetition will find a match slightly faster.
Upvotes: 4