Geo
Geo

Reputation: 96797

What regex can I use to capture groups from this string?

Assume the following strings:

The thing is, the numbers should be \d+, and in all of the strings A will always be present, while B may not. A will always be followed by one or more digits, and so will B, if present. What regex could I use to capture A and B's digit?

I have the following regex:

(A(\d+)).*?(B?(\d+)?)

but this only works for the first and the third case.

Upvotes: 1

Views: 144

Answers (4)

Thom Smith
Thom Smith

Reputation: 14086

  • Must A precede B? Assuming yes.
  • Can B appear more than once? Assuming no.
  • Can B appear except as part of a B-number group? Assuming no.

Then,

A\d+.*?(B\d+)?

using the lazy .*? or

A\d+[^B]*(B\d+)?

which is more efficient but requires that B be a single character.

EDIT: Upon further reflection, I have parenthesized the patterns in a less-than-perfect way. The following patterns should require fewer assumptions:

A\d+(.*?B\d+)?
a\d+([^B]*B\d+)?

Upvotes: 3

Jeff Ferland
Jeff Ferland

Reputation: 18292

A\d+.*(B\d+)?

OK, so that provides something which passes all test cases... BUT it has some false positives.

A\d+(.*B\d+)?

It seems other characters should only appear if B(whatever) is after them, so use the above instead.

#perl test case hackup
@array = ('A01B100', 'A01.B100', 'A01', 'A01............................B100', 'A01FAIL', 'NEVER');
for (@array) {
print "$_\n" if $_ =~ /^A\d+(.*B\d+)?$/;
}

Upvotes: 0

Evan Fosmark
Evan Fosmark

Reputation: 101671

import re
m = re.match(r"A(?P<d1>\d+)\.*(B(?P<d2>\d+))?", "A01.B100")
print m.groupdict()

Upvotes: 0

VonC
VonC

Reputation: 1324073

(?ms)^A(\d+)(?:[^\n\r]*B(\d+))?$

Assuming one string per line:

  • the [^\n\r]* is a non-greedy match for any characters (except newlines) after Axx, meaing it could gobble an intermediate Byy before the last B:

    A01...B01...B23

would be matched, with 01 and 23 detected.

Upvotes: 1

Related Questions