Rupert Cobbe-Warburton
Rupert Cobbe-Warburton

Reputation: 653

Multiple occurences of same character in a string regexp - Python

Given a string made up of 3 capital letters, 1 small caps and another 3 capital ones, e.g. AAAaAAA

I can't seem to find a regexp that would find a string which matches a string that has:

e.g. A B C a AA C (no spaces)

EDIT:

Turns out I needed something slightly different e.g. ABCaAAC where 'a' is the small caps version of the very fist character, not just any character

Upvotes: 4

Views: 6697

Answers (1)

Andrew Clark
Andrew Clark

Reputation: 208405

The following should work:

^([A-Z])(?!.?\1)([A-Z])(?!\2)([A-Z])[a-z]\1\1\3$

For example:

>>> regex = re.compile(r'^([A-Z])(?!.?\1)([A-Z])(?!\2)([A-Z])[a-z]\1\1\3$')
>>> regex.match('ABAaAAA')  # fails: first three are not different
>>> regex.match('ABCaABC')  # fails: first two of second three are not first char
>>> regex.match('ABCaAAB')  # fails: last char is not last of first three
>>> regex.match('ABCaAAC')  # matches!
<_sre.SRE_Match object at 0x7fe09a44a880>

Explanation:

^          # start of string
([A-Z])    # match any uppercase character, place in \1
(?!.?\1)   # fail if either of the next two characters are the previous character
([A-Z])    # match any uppercase character, place in \2
(?!\2)     # fail if next character is same as the previous character
([A-Z])    # match any uppercase character, place in \3
[a-z]      # match any lowercase character
\1         # match capture group 1
\1         # match capture group 1
\3         # match capture group 3
$          # end of string

If you want to pull these matches out from a larger chunk of text, just get rid of the ^ and $ and use regex.search() or regex.findall().

You may however find the following approach easier to understand, it uses regex for the basic validation but then uses normal string operations to test all of the extra requirements:

def validate(s):
    return (re.match(r'^[A-Z]{3}[a-z][A-Z]{3}$', s) and s[4] == s[0] and 
            s[5] == s[0] and s[-1] == s[2] and len(set(s[:3])) == 3)

>>> validate('ABAaAAA')
False
>>> validate('ABCaABC')
False
>>> validate('ABCaAAB')
False
>>> validate('ABCaAAC')
True

Upvotes: 11

Related Questions