The 29th Saltshaker
The 29th Saltshaker

Reputation: 691

How to look for consecutive repeats in regex?

I am trying to find matches where an alphanumeric character is repeated consecutively. I am trying re.match("(\w)[\\1][\\1]",mystring) but it doesn't seem to work (always returns None). I am trying to say "whatever alphanumeric letter is captured in the parentheses, check if it occurs twice in a row anywhere."

Upvotes: 1

Views: 3427

Answers (3)

Nir Alfasi
Nir Alfasi

Reputation: 53525

Close enough :)

You can use re.findall or re.search:

mystring = 'abccd'
print re.findall(r'(\w)\1', mystring)  # ['c']

The reason you fail to do so with match is that match tries to find a matching from the beginning of the string and there is no such pattern "(\w)[\\1][\\1]" in the beginning of the string.

If you want to use match you can still do so, but it requires additional access to the captured group:

mystring = 'abccd'
m = re.match(r'.*(\w)\1', mystring)  
print m.group(1)  # 'c'

Upvotes: 2

R Nar
R Nar

Reputation: 5515

>>> pat = re.compile(r'(\w)\1')
>>> pat.findall('1234456678')
['4', '6']

you were very close but reminder that using [] means to capture anything inside of that [] brackets so [\\1] would match the literal characters \ and 1. also, if you repeated it twice, it would try and find triples since the first group is counted as the first occurence.

This will only match doubles BTW, if you want to match ANY amounts of repeats, add a + token at the end of the regex

Upvotes: 0

dawg
dawg

Reputation: 103744

You can do something like:

import re
test='abcaabbccaaa123333333'
print re.findall(r'(([a-zA-Z0-9])\2+)', test)

Prints:

[('aa', 'a'), ('bb', 'b'), ('cc', 'c'), ('aaa', 'a'), ('3333333', '3')]

Upvotes: 1

Related Questions