Reputation: 901
I have the following text
">UNWANTEDTEXT">APRODUCT</ProductCode>
I'm looking to build a regex statement with my desired result being the text
APRODUCT
The regex I have at the moment is this.
">(.*?)<\/ProductCode>
The problem I'm facing is that the same text pattern of "> occurs at the start... I need a way of telling the regex to only look at the last occurrence of the "> then pull the value between it and </ProductCode>
Upvotes: 1
Views: 50
Reputation: 8467
The easiest solution is to indicate which characters you want to match instead of any character, i.e. any character that's not a closing angle bracket:
([^>]*)<\/ProductCode>
If the string can contain a closing angle bracket if it's not preceded by a quotation mark, the solution gets a little hairier. Assuming your regex library supports zero-width assertions:
(?:">)?((?:(?!">).)*)<\/ProductCode>
Hope this helps!
I also want to add that if you're parsing SGML, you might consider using a library dedicated to that purpose instead of trying to cobble together your own parser based on regular expressions. That path is fraught with peril.
Upvotes: 2