Peter H
Peter H

Reputation: 901

Regex match similar pattern string but I need the last occurrence

I have the following text

">UNWANTEDTEXT">APRODUCT</ProductCode>

I'm looking to build a regex statement with my desired result being the text

APRODUCT

The regex I have at the moment is this.

">(.*?)<\/ProductCode>

The problem I'm facing is that the same text pattern of "> occurs at the start... I need a way of telling the regex to only look at the last occurrence of the "> then pull the value between it and </ProductCode>

Upvotes: 1

Views: 50

Answers (1)

mwp
mwp

Reputation: 8467

The easiest solution is to indicate which characters you want to match instead of any character, i.e. any character that's not a closing angle bracket:

([^>]*)<\/ProductCode>

If the string can contain a closing angle bracket if it's not preceded by a quotation mark, the solution gets a little hairier. Assuming your regex library supports zero-width assertions:

(?:">)?((?:(?!">).)*)<\/ProductCode>

Hope this helps!

I also want to add that if you're parsing SGML, you might consider using a library dedicated to that purpose instead of trying to cobble together your own parser based on regular expressions. That path is fraught with peril.

Upvotes: 2

Related Questions