Reputation: 239
I'm parsing some search query text for ISBNs. Each line may contain zero or more ISBN10s, zero or more ISBN13s, and other unrelated digits. The text has been sanitized to contain only [a-zA-Z0-9 ]
, but there may be whitespace between related digits. I've successfully written some regular expression fragments to parse the text, but I'm unsure how to get them to execute in the order I want.
First, here's a sample line of text from the data:
foo ISBN10 013 284 1649 0132841648 Web ISBN13 9 78013 2841641 9780132841641 2013 bar
I need to (in this order):
([9][7][8-9]\d{10})
.(\d{10})
([9]\s*[7]\s*[8-9]\s*(\s*\d){10})
(\d(\s*\d){9})
This accomplishes searching for complete ISBN13s, then complete ISBN10s, then fragmented ISBN13s, and finally fragmented ISBN10s. However, if I simply paste them together separated by |
, the RegEx engine wants to evaluate (1,2,3,4) for each character. How do I accomplish searching the entire string for expression #1 before even thinking about evaluating #2?
Upvotes: 3
Views: 217
Reputation: 361585
Perform four replaceAll
s and remove the successful matches at each step (i.e., replace them with ""
).
Upvotes: 1