Reputation: 53
I have a doubt about regex with backreference.
I need to match strings, I try this regex (\w)\1{1,}
to capture repeated values of my string, but this regex only capture consecutive repeated strings; I'm stuck to improve my regex to capture all repeated values, below some examples:
import re
str = 'capitals'
re.search(r'(\w)\1{1,}', str)
Output None
import re
str = 'butterfly'
re.search(r'(\w)\1{1,}', str)
<_sre.SRE_Match object; span=(2, 4), match='tt'>
Upvotes: 3
Views: 13222
Reputation: 69
Hope the code below will help you understand the Backreference concept of Python RegEx
There are two sets of information available in the given string str
Employee Basic Info:
Employee designation
import re
#sample input
str="""
@daniel dxc chennai 45000 male daniel @henry infosys bengaluru 29000 male hobby-
swimming henry
@raja zoho chennai 37000 male raja @ramu infosys bengaluru 99000 male hobby-badminton
ramu
%daniel python developer daniel% %henry database admin henry%
%raja Testing lead raja% %ramu Manager ramu%
"""
#backreferencing employee name (\w+) <---- \1
#----------------------------------------------
basic_info=re.findall(r'@+(\w+)(.*?)\1',str)
print(basic_info)
#(%) <-- \1 and (\w+) <--- \2
#-------------------------------
designation=re.findall(r'(%)+(\w+)(.*?)\2\1',str)
print(designation)
for i in range(len(designation)):
designation[i]=(designation[i][1],designation[i][2])
print(designation)
Upvotes: 2
Reputation: 151
I would use r'(\w).*\1
so that it allows any repeated character even if there are special characters or spaces in between.
However this wont work for strings with repeated characters overlapping the contents of groups like the string abcdabcd
, in which it only recognizes the first group, ignoring the other repeated characters enclosed in the first group (b,c,d)
Check the demo: https://regex101.com/r/m5UfAe/1
So an alternative (and depending on your needs) is to sort the string analyzed:
import re
str = 'abcdabcde'
re.findall(r'(\w).*\1', ''.join(sorted(str)))
returning the array with the repeated characters ['a','b','c','d']
Upvotes: 6