XYZ
XYZ

Reputation: 395

Python : Regex, Finding Repetitions on a string

I need to find repetitions in a text string. I already found a very nice elegant solution here from @Tim Pietzcker

I am happy with the solution as is but would like to know whether it's possible to extend it little further such that it would accept a string with whitespaces.

For example "a bcab c" would return [(abc,2)]

I tried using the regex pattern "([^\s]+?)\1+") with no luck. Any help is much appreciated.

Upvotes: 3

Views: 206

Answers (2)

sanooj
sanooj

Reputation: 493

You should think about removing " " from the text first. You can do it by regex itself.

>>> def repetitions(s):
...    r = re.compile(r"(.+?)\1+")
...    for match in r.finditer(re.sub(r'\s+',"",s)):
...        yield (match.group(1), len(match.group(0))/len(match.group(1)))
... 

Output.

>>> list(repetitions("a bcab c"))
[('abc', 2)]

If you still want to retain the space in the original text, Try this regex: r"(\s*\S+\s*?\S*?)\1+" . But this has limitations.

>>> def repetitions(s):
...    r = re.compile(r"(\s*\S+\s*?\S*?)\1+")
...    for match in r.finditer(s):
...        yield (match.group(1), len(match.group(0))/len(match.group(1)))
... 

Results:

>>> list(repetitions(" abc abc "))
[(' abc', 2)]
>>> list(repetitions("abc abc "))
[('abc ', 2)]
>>> list(repetitions(" ab c ab c "))
[(' ab c', 2)]
>>> list(repetitions("ab cab c "))
[('ab c', 2)]
>>> list(repetitions("blablabla"))
[('bla', 3)]

Upvotes: 1

Fatih Aktaş
Fatih Aktaş

Reputation: 1574

Using (\S+ ?\S?)\1, you can make it tolerable to spaces for strings as below where the positions of the spaces are in the same location in the repetetive words ab c.

ab cab c 

However, if the space locations in the repetitive words are not the same. Then it means, you have to replace the meaningless spaces with an empty string "" to find the repetitive words with your approach.

Upvotes: 0

Related Questions