Matching multiple double characters in a word - Python regex

Question

I want to identify words (in a dictionary structure) that have 2 sets of double letters.

I am new to Python / regex - but have managed to pull together code that is nearly there from some similar questions elsewhere on the site. But it doesn't quite work.

It picks up two sets of doubles but only if they are the same letter, and it picks them up if separated. I think that the second use of \1 is the problem and only works if it is the same letter as the first capture group. Using regex101 confirms this but not sure how to adapt the regex to get the match right.

Any pointers to where I am going wrong would be appreciated.

#logic being [any letter]* [any letter repeated] [any letter]* [any letter repeated] [any letter]* 

import json
import re

dict_data = {"hello":0, "aaoo":0, "aabaa":0, "aaaba":0, "bookkeeping":0, "bookkeeooping":0}
for key in dict_data:
    if re.search(r'\b.*(.)\1.*(.)\1.*\b', key):
        print("Match found: ", key)
    else:
        print("No match:    ", key)

Output is:

No match:     hello
No match:     aaoo          #This should work but doesn't
Match found:  aabaa         #This works
Match found:  aaaba         #This shouldn't, assume it is matching either 2nd&3rd a or 3rd&4th a
No match:     bookkeeping   #This should match but doesn't
Match found:  bookkeeooping #This works, assume it is matching oo twice

Wiktor Stribiżew · Accepted Answer

The second \1 refers to the value of the first capturing group, while you need to refer to the second group value, with \2.

The re.search searches for a regex match anywhere in the input string, you do not need .* on both ends of the input.

Use

dict_data = {"hello":0, "aaoo":0, "aabaa":0, "aaaba":0, "bookkeeping":0, "bookkeeooping":0}
for key in dict_data:
    if re.search(r'(.)\1.*(.)\2', key):
        print("Match found: ", key)
    else:
        print("No match:    ", key)

See the Python demo yielding

No match:     hello
Match found:  aaoo
Match found:  aabaa
No match:     aaaba
Match found:  bookkeeping
Match found:  bookkeeooping

Matching multiple double characters in a word - Python regex

Answers (1)

Related Questions