daniel
daniel

Reputation: 37

How to detect that a string contains character repetition longer than a 2-length sequence

Currently:

def detect_repet(s):
return_list=[]
    split_text= s.split('\n')
    print(split_text)
    for x in split_text:
        print(x)
    return

print(detect_repet('Well, sheep says beeeee and\ncat says miaaaaaaaw\nand cow would shout mooooooow'))

I am struggling with the detection of atleast two identical chracter in a row in a string: I've tried this but indexes overflow in later iterations:

my_string= "danieeeeel"
for i in range(len(my_string)):
    if(my_string[i]==my_string[i+1]==my_string[i+2]):
        print('YES')

    else:
        print('NO')

The ideal ouput of the detect_repet would be ['beeeee', 'miaaaaaaaw','mooooooow']

Upvotes: 0

Views: 70

Answers (4)

fsimonjetz
fsimonjetz

Reputation: 5802

There's nothing to add to @rdas's answer in terms of solving your specific problem. Still, on a didactical note, I'd like to point to groupby from the builtin itertools module. Whenever you need to group a sequence of elements [a, a, a, b, b, b] into chunks [[a, a, a], [b, b, b], the first thing that comes to mind is groupby.

It generates (label, subsequence)-tuples you can iterate over. Since subsequence is a generator, you have to turn it into a list in order to calculate the length. With that in mind, another approach to your problem could be something like:

from itertools import groupby

def detect_repet(s):
    for group in groupby(s):
        if len(list(group[1])) > 2:
            return True
    return False

This can be made even more concise and efficient, but it illustrates the idea.

You'd use it like this:

>>> text = 'Well, sheep says beeeee and\ncat says miaaaaaaaw\nand cow would shout mooooooow'
>>> [word for word in text.split() if detect_repeat(word)]
['beeeee', 'miaaaaaaaw', 'mooooooow']

Upvotes: 0

XxJames07-
XxJames07-

Reputation: 1826

You can check if the list line[i:i+3] is composed of equal values using sets running for a loop n-2 times because of the index [i:i+3] would be out of bounds:

def detect_repeat(string):
    retval = set()
    for line in string.split():
        for i in range(len(line) - 2):
            if len(set(line[i:i+3])) == 1:
                retval.add(line)
    return retval

or, with a set comprehension:

detect_repeat = lambda s:{line for i in range(len(line)-2) for line in s.split() if len(set(line[i:i+3])) == 1}

Output in either way:

{'miaaaaaaaw', 'beeeee', 'mooooooow'}

Upvotes: 0

rdas
rdas

Reputation: 21285

Your inner loop need to run till len(my_string) - 2 to account for the index i+2 which needs to be less than len(my_string) in the end.

You should also use a set to avoid duplicate results from longer runs of the same char:

def detect_repet(string):
    retval = set()
    for line in string.split():
        for i in range(len(line) - 2):
            if line[i] == line[i + 1] == line[i + 2]:
                retval.add(line)
    return retval


print(detect_repet('Well, sheep says beeeee and\ncat says miaaaaaaaw\nand cow would shout mooooooow'))

Result:

{'beeeee', 'miaaaaaaaw', 'mooooooow'}

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521457

I would use re.findall with the regex pattern \b(\w*(\w)\2\w*)\b:

inp = "Well, sheep says beeeee and\ncat says miaaaaaaaw\nand cow would shout mooooooow"
matches = [x[0] for x in re.findall(r'\b(\w*(\w)\2\w*)\b', inp)]
print(matches)  # ['Well', 'sheep', 'beeeee', 'miaaaaaaaw', 'mooooooow']

Note that your sample input string actually turned up two other words which repeat the same letter 2 or more times: Well and sheep.

Upvotes: 2

Related Questions