Python regex to detect one of multiple optional substrings following a string

Question

I need to match patterns like the following: AAXX#

Where:
* AA is from a set (ie. a list) of 1-3 char alpha prefixes,
* XX is from a different list of pre-defined strings, and
* any single-digit numeral follows.

AA strings: ['bo','h','fr','sam','pe']

XX strings: cl + ['x','n','r','nr','eaner] //OR ELSE JUST// ro

Desired Result: bool indicating whether any of the possible combos match the provided string.

Sample Test Strings:
item = "boro1" - that is, bo + ro + 1
item = "samcl2"- i.e. sam + cl + 2
item = "hcln3" - i.e. h + cln + 3

The best I can figure is to use a loop, but I am having trouble with the essential regex. It works for the single-letter optionals cln, clx, clr, but not for the longer ones clnr, cleaner.

Code:

item = "hclnr2" #h + clnr + 2
out = False
arr = ['bo','h','fr','sam','pe']
for mnrl in arr:
    myrx = re.escape(mnrl) + r'cl[x|n|r|nr|eaner]\d'
    thisone = bool(re.search(myrx, item))
    print('mnrl: '+mnrl+' - ', thisone)
    if thisone: out = True

##########################################################################
# SKIP THIS - INCLUDED IN CASE S/O HAS A BETTER SOLUTION THAN A SECOND LOOP
# THE ABOVE FOR-LOOP handled THE CL[opts] TESTS, THIS LOOP DOES THE RO TESTS
##########################################################################
#if not out: #If not found a match amongst the "cl__" options, test for "ro"
#    for mnrl in arr:
#        myrx = re.escape(mnrl) + r'ro\d'
#        thisone = bool(re.search(myrx, item))
#        print('mnrl: '+mnrl+' - ', thisone)
#    if thisone: out = True
##########################################################################

print('result: ', out)

PRINTS:

mnrl: bo - False
mnrl: h - False <======
mnrl: fr - False
mnrl: sam - False
mnrl: pe - False

However, changing `item` to:

item = "hcln2" #h + cln + 2

PRINTS:
mnrl: bo - False
mnrl: h - True <========
mnrl: fr - False
mnrl: sam - False
mnrl: pe - False

And ditto for item = hclr5 or item = hclx9 BUT NOT hcleaner9

SpghttCd · Accepted Answer

My approach would be

import re

words = ['boro1', 'samcl2', 'hcln3', 'boro1+unwantedstuff']

p = r'(bo|h|fr|sam|pe)(cl(x|n|r|nr|eaner|)|ro)\d$'

for w in words:
      print(re.match(p, w))

Result:

<_sre.SRE_Match object; span=(0, 5), match='boro1'>
<_sre.SRE_Match object; span=(0, 6), match='samcl2'>    
<_sre.SRE_Match object; span=(0, 5), match='hcln3'>
None

For your desired boolean output you can simply cast the match object to 'bool'.

Python regex to detect one of multiple optional substrings following a string

Code:

PRINTS:

However, changing `item` to:

Answers (2)

Related Questions

Python regex to detect one of multiple optional substrings following a string

Code:

PRINTS:

However, changing item to:

Answers (2)

Related Questions

However, changing `item` to: