tfrederick74656
tfrederick74656

Reputation: 239

Enforcing match order in a RegEx

I'm parsing some search query text for ISBNs. Each line may contain zero or more ISBN10s, zero or more ISBN13s, and other unrelated digits. The text has been sanitized to contain only [a-zA-Z0-9 ], but there may be whitespace between related digits. I've successfully written some regular expression fragments to parse the text, but I'm unsure how to get them to execute in the order I want.

First, here's a sample line of text from the data:

foo ISBN10 013 284 1649 0132841648 Web ISBN13 9 78013 2841641 9780132841641 2013 bar

I need to (in this order):

  1. Search the entire string for ([9][7][8-9]\d{10}).
  2. Search only things that weren't already matched for (\d{10})
  3. Search only things that weren't already matched for ([9]\s*[7]\s*[8-9]\s*(\s*\d){10})
  4. Search only things that weren't already matched for (\d(\s*\d){9})

This accomplishes searching for complete ISBN13s, then complete ISBN10s, then fragmented ISBN13s, and finally fragmented ISBN10s. However, if I simply paste them together separated by |, the RegEx engine wants to evaluate (1,2,3,4) for each character. How do I accomplish searching the entire string for expression #1 before even thinking about evaluating #2?

Upvotes: 3

Views: 217

Answers (1)

John Kugelman
John Kugelman

Reputation: 361585

Perform four replaceAlls and remove the successful matches at each step (i.e., replace them with "").

Upvotes: 1

Related Questions