Regex match similar pattern string but I need the last occurrence

Question

I have the following text

">UNWANTEDTEXT">APRODUCT

I'm looking to build a regex statement with my desired result being the text

APRODUCT

The regex I have at the moment is this.

">(.*?)<\/ProductCode>

The problem I'm facing is that the same text pattern of "> occurs at the start... I need a way of telling the regex to only look at the last occurrence of the "> then pull the value between it and

mwp · Accepted Answer

The easiest solution is to indicate which characters you want to match instead of any character, i.e. any character that's not a closing angle bracket:

([^>]*)<\/ProductCode>

If the string can contain a closing angle bracket if it's not preceded by a quotation mark, the solution gets a little hairier. Assuming your regex library supports zero-width assertions:

(?:">)?((?:(?!">).)*)<\/ProductCode>

Hope this helps!

I also want to add that if you're parsing SGML, you might consider using a library dedicated to that purpose instead of trying to cobble together your own parser based on regular expressions. That path is fraught with peril.

Regex match similar pattern string but I need the last occurrence

Answers (1)

Related Questions