Matthew
Matthew

Reputation: 3

RegEx - Searching for specific content in quotes

I know RegEx is NOT the most ideal tool for searching within HTML. However, it's what I'm given to work with. Note: I'm not looking for something that will be robust across websites. For example, I'm just considering quotation marks, and I'm not worried about apostrophe characters.

Suppose I have the following text:

The quick brown "fox.jpg" jumps "google.com" over the "lazy.png" dog.

I'm wanting to search for specific Image links, matching "fox.jpg" and "lazy.png", ignoring "google.com". I could theoretically use a search pattern like

".*?"

that would find all quotes, from which I could simply parse each match to determine whether or not it's an image.

But something like

".*?(jpg|png)"

doesn't work because it returns "fox.jpg" (good) and "google.com" over the "lazy.png" (bad).

So: is there an extra "greedy" setting that I'm missing? Something to tell RegEx that the first quotation mark of the match should be the quotation mark closest to the last quotation mark?

Upvotes: 0

Views: 143

Answers (1)

CertainPerformance
CertainPerformance

Reputation: 370699

After the first ", try repeating anything but a ", via a negated character set, instead of ., which will (undesirably) match a ":

"[^"]*(jpg|png)"

https://regex101.com/r/PKZLp5/1

Doesn't matter whether the repetition is lazy or greedy now, though when the filename is longer than the file extension, greedy repetition will find a match slightly faster.

Upvotes: 4

Related Questions