Reputation: 587
I am working on alphanumeric data extraction from strings like
ABCADE12345ZYX
LMNADE12345ZXY
I need to extract ADE12345
from the first string and ADE12345
from the second string.
I have tried to use the following regular expression:
[ABC|LMN]+(\w+)Z.*
But this results in DE12345
for the first case and DE12345
for the second case.
How can I get expeected matches - ADE12345
and ADE12345
- in Python using re
?
Upvotes: 1
Views: 73
Reputation: 18611
Use this regex:
(?:ABC|LMN)(\w+)Z
See proof.
Explanation
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
ABC 'ABC'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
LMN 'LMN'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
Z 'Z'
import re
txt = 'ABCADE12345ZYX and LMNADE12345ZXY'
print(re.findall(r'(?:ABC|LMN)(\w+)Z', txt))
# ['ADE12345', 'ADE12345']
Upvotes: 2