BigAl
BigAl

Reputation: 23

Non-Greedy Regex Fails

My question has 2 parts...

First, I'm trying to extract the FIRST set of numbers separated by a slash ("12/56" in this case), and ignore the 2nd set (if it exists).

Sample String:

some text  12/56    34/67    ABCD1234   --Want to grab "12/56", but ignore "34/67"
more text  14/58             DEFG5678   --Want to grab "14/58".

I've tried using (\d\d\/\d\d)? as the pattern (non-greedy), however it doesn't stop after the first hit.

Second, once the above problem is solved, I still need to grab the 8-digit code after it (there will ALWAYS be an 8-digit code). I'd like to use something like (\d\d\/\d\d)?.+([A-Z0-9]{8}), however I'd think that the correct non-greedy search may stop regex in its tracks. Is this possible?

Upvotes: 1

Views: 829

Answers (3)

stema
stema

Reputation: 92976

Just remove the ? after the first capturing group.

(\d\d\/\d\d).+([A-Z0-9]{8})

See it here on Regexr, while hovering the mouse over the highlighted match you can see the content of the capturing groups.

Explanation:

With the ? you don't make the group "non-greedy", you make it optional. So, because you lines doesn't start with a digit, the regex skips the optional part and match everything with the following .+ till your last part.

You don't need a "non-greedy" behaviour here, your pattern will match the first occurrence and you can make a quantifier "ungreedy" but not a group.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

(\d\d/\d\d)\s+(?:\d\d/\d\d)?\s*([A-Z0-9]{8})

grabs the first but ignores the second set of nn/nn strings (if present), then grabs the next 8 uppercase ASCII alnum characters, assuming that nothing but whitespace will be between those items.

The results will then be in groups 1 and 2. So, for example in Python, you'd use

reobj = re.compile(r"(\d\d/\d\d)\s+(?:\d\d/\d\d)?\s*([A-Z0-9]{8})")
match = reobj.search(subject)
if match:
    first = match.group(1)
    second = match.group(2)
else:
    print "No match!"

Upvotes: 0

ziesemer
ziesemer

Reputation: 28687

Which language are you using these regular expressions in? Are you using a "find" or a "match" method? As long as you're using a "match" method, your last example (the "something like") should almost work as you'd expect - but I'd remove the ? after the first grouping of digits, unless you have a specific need for it:

(\d\d/\d\d).+([A-Z0-9]{8})

With using the "match" method, this will force both grouping to be populated, in order to complete a successful match.

Upvotes: 0

Related Questions