Using python regex with backreference matches

Question

I have a doubt about regex with backreference.

I need to match strings, I try this regex (\w)\1{1,} to capture repeated values of my string, but this regex only capture consecutive repeated strings; I'm stuck to improve my regex to capture all repeated values, below some examples:

import re

str = 'capitals'

re.search(r'(\w)\1{1,}', str)

Output None

import re

str = 'butterfly'

re.search(r'(\w)\1{1,}', str)

<_sre.SRE_Match object; span=(2, 4), match='tt'>

Henry · Accepted Answer

I would use r'(\w).*\1 so that it allows any repeated character even if there are special characters or spaces in between.

However this wont work for strings with repeated characters overlapping the contents of groups like the string abcdabcd, in which it only recognizes the first group, ignoring the other repeated characters enclosed in the first group (b,c,d)

Check the demo: https://regex101.com/r/m5UfAe/1

So an alternative (and depending on your needs) is to sort the string analyzed:

import re
str = 'abcdabcde'
re.findall(r'(\w).*\1', ''.join(sorted(str)))

returning the array with the repeated characters ['a','b','c','d']

Using python regex with backreference matches

Answers (2)

Related Questions