Reputation: 43
I want to identify words (in a dictionary structure) that have 2 sets of double letters.
I am new to Python / regex - but have managed to pull together code that is nearly there from some similar questions elsewhere on the site. But it doesn't quite work.
It picks up two sets of doubles but only if they are the same letter, and it picks them up if separated. I think that the second use of \1 is the problem and only works if it is the same letter as the first capture group. Using regex101 confirms this but not sure how to adapt the regex to get the match right.
Any pointers to where I am going wrong would be appreciated.
#logic being [any letter]* [any letter repeated] [any letter]* [any letter repeated] [any letter]*
import json
import re
dict_data = {"hello":0, "aaoo":0, "aabaa":0, "aaaba":0, "bookkeeping":0, "bookkeeooping":0}
for key in dict_data:
if re.search(r'\b.*(.)\1.*(.)\1.*\b', key):
print("Match found: ", key)
else:
print("No match: ", key)
Output is:
No match: hello
No match: aaoo #This should work but doesn't
Match found: aabaa #This works
Match found: aaaba #This shouldn't, assume it is matching either 2nd&3rd a or 3rd&4th a
No match: bookkeeping #This should match but doesn't
Match found: bookkeeooping #This works, assume it is matching oo twice
Upvotes: 4
Views: 521
Reputation: 627100
The second \1
refers to the value of the first capturing group, while you need to refer to the second group value, with \2
.
The re.search
searches for a regex match anywhere in the input string, you do not need .*
on both ends of the input.
Use
dict_data = {"hello":0, "aaoo":0, "aabaa":0, "aaaba":0, "bookkeeping":0, "bookkeeooping":0}
for key in dict_data:
if re.search(r'(.)\1.*(.)\2', key):
print("Match found: ", key)
else:
print("No match: ", key)
See the Python demo yielding
No match: hello
Match found: aaoo
Match found: aabaa
No match: aaaba
Match found: bookkeeping
Match found: bookkeeooping
Upvotes: 3