Reputation: 14112

python regex: expression to match number and letters

The aim is to print everything which does not end with "_C[any number+letter]".

def regexer():

import re
name_list = ['chrome_PM', 'chrome_P', 'chromerocker_C', 'chromebike_P1', 
                 'chromecar_CMale', 'chromeone_C1254']

for name in name_list:
    counts_tail = re.compile('_C\d*$')
    if not counts_tail.search(name):
        print name

Output:

chrome_PM
chrome_P
chromebike_P1
chromecar_CMale

How can I edit my code to avoid printing "chromecar_CMale"?

Upvotes: 1

Answers (4)

JoErNanO

Reputation: 2488

Let's invert the logic here. You are searching for things you don't want, and filtering them out. Instead why not search for the things you want?

Your regexp could look like this: _C[A-Za-z]+[\D], where:

_C is the starting C you need
[A-Za-z]+ matches any lower/upper case letter more than once
[\D] excludes there being digits after the letters, thus avoiding matching stuff like chromecar_CM123. Note: capital \D is the negation of the shorthand \d

So the Python code would become something like:

import re

def regexer():
    name_list = ['chrome_PM', 'chrome_P', 'chromerocker_C', 'chromebike_P1', 
             'chromecar_CMale', 'chromeone_C1254']

    counts_tail = re.compile('_C[A-Za-z]+[\D]') # Build regexp here - no need to do it in the loop
    for name in name_list:
        if counts_tail.search(name):
            print name

Here is the regexp in action:

enter image description here

Upvotes: 0

Avinash Raj

Reputation: 174844

Change your regex like below,

>>> import re
>>> name_list = ['chrome_PM', 'chrome_P', 'chromerocker_C', 'chromebike_P1', 
                 'chromecar_CMale', 'chromeone_C1254']
>>> for name in name_list:
    if re.search(r'^(?!.*(?:_C\d+|_C)$)(?=.*_C)', name):
        print(name)


chromecar_CMale

Explanation:

^ Asserts that we are at the start.
(?!.*(?:_C\d+|_C)$) This negative lookahead asserts that there wouldn't be anything like _C one or more digits or _C immediately followed by an end of the line anchor.
(?=.*_C) Asserts that there must be a substring like _C would present.
Match the string starts ^ only if the above conditions are satisfied.
The above regex would be written as ^(?!.*_C(\d+)?$)(?=.*_C)