Evan Brittain
Evan Brittain

Reputation: 587

Regular expression to match a substring

I am working on alphanumeric data extraction from strings like

ABCADE12345ZYX
LMNADE12345ZXY

I need to extract ADE12345 from the first string and ADE12345 from the second string.

I have tried to use the following regular expression:

[ABC|LMN]+(\w+)Z.*

But this results in DE12345 for the first case and DE12345 for the second case.

How can I get expeected matches - ADE12345 and ADE12345 - in Python using re?

Upvotes: 1

Views: 73

Answers (1)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use this regex:

(?:ABC|LMN)(\w+)Z

See proof.

Explanation

--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    ABC                      'ABC'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    LMN                      'LMN'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  Z                        'Z'

Python code:

import re
txt = 'ABCADE12345ZYX and LMNADE12345ZXY'
print(re.findall(r'(?:ABC|LMN)(\w+)Z', txt))
# ['ADE12345', 'ADE12345']

Upvotes: 2

Related Questions