Reputation: 16107
I want to match "three uppercase letters, one lowercase letters, and three uppercase letters" using regular expression. What makes this difficult is that adjacent uppercase letters must be same. For example, I expect AAAbCCC
, but not AAAbCCD
or ABAbCDC
.
Here is what I've tried:
print(re.findall("[A-Z]{3}[a-z][A-Z]{3}", l))
However, this is not what I want, because it matches AAAbCCD
and ABAbCDC
as well.
Upvotes: 2
Views: 3013
Reputation: 149736
You can use capture groups and backreferences:
re.findall(r"(([A-Z])\2\2[a-z]([A-Z])\3\3)", string)
Note, however, that in the presence of groups in the pattern re.findall()
will return the groups instead of matches.
So to get the matched strings you'll need to enclose the whole pattern in parentheses and take the 1st group:
>>> s = "AAAbCCC AAAbCCD"
>>> [groups[0] for groups in re.findall(r"(([A-Z])\2\2[a-z]([A-Z])\3\3)", s)]
['AAAbCCC']
You can also use re.finditer()
, which returns an iterator over the match objects:
>>> [match.group(1) for match in re.finditer(r"(([A-Z])\2\2[a-z]([A-Z])\3\3)", s)]
['AAAbCCC']
Upvotes: 2
Reputation: 6209
You can use ([A-Z])\1{2}[a-z]([A-Z])\2{2}
.
It stores the first found upercase character in a group and reuse it with \1
(and \2
) to check the two following chars.
Upvotes: 3
Reputation: 41987
Leverage captured grouping:
^([A-Z])\1\1[a-z]([A-Z])\2\2$
^([A-Z])
captures the first uppercase, and put in captured group 1, \1\1
matches next two characters if they are same as the captured one. same goes for the second captured one, later referenced by \2
You can use range matching, {}
:
^([A-Z])\1{2}[a-z]([A-Z])\2{2}$
Upvotes: 3