Boosted_d16
Boosted_d16

Reputation: 14112

python regex: expression to match number and letters

The aim is to print everything which does not end with "_C[any number+letter]".

def regexer():

import re
name_list = ['chrome_PM', 'chrome_P', 'chromerocker_C', 'chromebike_P1', 
                 'chromecar_CMale', 'chromeone_C1254']

for name in name_list:
    counts_tail = re.compile('_C\d*$')
    if not counts_tail.search(name):
        print name

Output:

chrome_PM
chrome_P
chromebike_P1
chromecar_CMale

How can I edit my code to avoid printing "chromecar_CMale"?

Upvotes: 1

Views: 3245

Answers (4)

JoErNanO
JoErNanO

Reputation: 2488

Let's invert the logic here. You are searching for things you don't want, and filtering them out. Instead why not search for the things you want?

Your regexp could look like this: _C[A-Za-z]+[\D], where:

  • _C is the starting C you need
  • [A-Za-z]+ matches any lower/upper case letter more than once
  • [\D] excludes there being digits after the letters, thus avoiding matching stuff like chromecar_CM123. Note: capital \D is the negation of the shorthand \d

So the Python code would become something like:

import re

def regexer():
    name_list = ['chrome_PM', 'chrome_P', 'chromerocker_C', 'chromebike_P1', 
             'chromecar_CMale', 'chromeone_C1254']

    counts_tail = re.compile('_C[A-Za-z]+[\D]') # Build regexp here - no need to do it in the loop
    for name in name_list:
        if counts_tail.search(name):
            print name

Here is the regexp in action:

enter image description here

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174844

Change your regex like below,

>>> import re
>>> name_list = ['chrome_PM', 'chrome_P', 'chromerocker_C', 'chromebike_P1', 
                 'chromecar_CMale', 'chromeone_C1254']
>>> for name in name_list:
    if re.search(r'^(?!.*(?:_C\d+|_C)$)(?=.*_C)', name):
        print(name)


chromecar_CMale

Explanation:

  • ^ Asserts that we are at the start.
  • (?!.*(?:_C\d+|_C)$) This negative lookahead asserts that there wouldn't be anything like _C one or more digits or _C immediately followed by an end of the line anchor.
  • (?=.*_C) Asserts that there must be a substring like _C would present.
  • Match the string starts ^ only if the above conditions are satisfied.
  • The above regex would be written as ^(?!.*_C(\d+)?$)(?=.*_C)

Upvotes: 1

qstebom
qstebom

Reputation: 739

I would extend the regexp to accept words ([0-9a-zA-Z_]):

re.compile('_C\w+$')

Of course, that will accept any combination of letters or numbers. If you want to restrict it to letters or numbers only, you can do the following:

re.compile('_C(\d+|[a-zA-Z]+)$')

Upvotes: 0

vks
vks

Reputation: 67988

_C[\da-zA-Z]*$

This should do it.

Upvotes: 1

Related Questions