kspr
kspr

Reputation: 1040

Optional match group at the start of a regex always empty

I have a string asd@12A/AXB25017/12A@£££ which I want to extract 12A/AXB25017/12A from.

I have designed a regexp pattern as following

'.*(([A-Z0-9]+/)?[A-Z]{3}\d+(/[A-Z0-9]+)?)'

Because I can have many different string variations. Valid string variations with different letters and numbers which I want to extract can be.

1) AXB25017
2) 1/AXB25017
3) AXB25017/1
4) 1A/AXB25017
5) AXB25017/1A
6) EN/AXB25017
7) EN/AXB25017/1

When I do

 re.match('.*(([A-Z0-9]+/)?[A-Z]{3}\d+(/[A-Z0-9]+)?)','  @12A/AXB25017/12A@').group(1)

It does not find the initial 12A/ but returns AXB25017/12A

What am I missing in my pattern to correctly catch 12A/ as well?

Upvotes: 1

Views: 58

Answers (1)

collapsar
collapsar

Reputation: 17238

The first optional part in your regex ([A-Z0-9]+/)? does not match, because ...

  • it is optional
  • the preceding 'match all' subexpression matches greedily

Include the delimiters in the regex:

.*\@(([A-Z0-9]+/)?[A-Z]{3}\d+(/[A-Z0-9]+)?)

If there is a possibility of material trailing the second delimiter, add that to the regex ...

.*\@(([A-Z0-9]+/)?[A-Z]{3}\d+(/[A-Z0-9]+)?)\@

... which might allow for a drastic simplification:

\@[^@]+\@

Upvotes: 3

Related Questions