Reputation: 225

How to reduce the duplicated characters in a string using Python

Is there a way to reduce a duplicated characters to specific number, for example if we have this string.

"I liiiiked it, thaaaaaaank you"

Expected output: "I liiiiked it thaaaank you"

so if the duplicated character over 4, for example, it should be reduced to only four characters and if it less than or equal 4 then the word should stays the same.

Upvotes: 4

Answers (3)

John La Rooy

Reputation: 304215

>>> import re
>>> s="I liiiiked it, thaaaaaaank you"
>>> re.sub(r"(.)(\1{3})(\1+)", r"\1\2", s)
'I liiiiked it, thaaaank you'

This regular expression looks for 3 groups.

The first is any character. The second is 3 more of that same character, and the third is one or more of the first character.

Those 3 groups are then replaced by just group 1 and group 2

Here is an even simpler method

>>> re.sub(r"(.)\1{4,}", r"\1"*4, s)
'I liiiiked it, thaaaank you'

This time there is just one group (.), which is the first letter of the match. This must be followed by the same letter 4 or more times \1{4,}. So it matches 5 or more of the same letter. The replacement is just that letter repeated 4 times.

Upvotes: 12

Marius

Reputation: 60080

You can do this with a single scan through the input string, just keep a count of the current character and don't add it to the output if you've got too many repeats:

input_string = "I liiiiked it, thaaaaaaank you"

max_reps = 4
prev_char = None
rep_count = 0
output = ""

for char in input_string:
    if not char == prev_char:
        rep_count = 1
        prev_char = char
        output += char
    else:
        if rep_count < max_reps:
            rep_count += 1
            output += char
        else:
            rep_count += 1

A version that's possibly faster by avoiding string concatenation (see this question):

input_string = "I liiiiked it, thaaaaaaank you"

max_reps = 4
prev_char = None
rep_count = 0
output_list = []

for char in input_string:
    if not char == prev_char:
        rep_count = 1
        prev_char = char
        output_list.append(char)
    else:
        if rep_count < max_reps:
            rep_count += 1
            output_list.append(char)
        else:
            rep_count += 1

output = ''.join(output_list)

Upvotes: 2

NG.

Reputation: 22904

Not the best solution - my regex needs to be fixed... I think

import re

def rep(o):
    g = o.group(0)
    if len(g) > 4:
        return g[0:3]
    return g

foo = 'iiiiiiii liiiiiiikkkkkkkkkeeeee fooooooddd'
foo1 = re.sub(r'(\w)\1+', rep, foo)

# iiii liiiikkkkeeee fooooddd

You can probably start tinkering with this if you are so inclined.

Upvotes: 1

How to reduce the duplicated characters in a string using Python

Answers (3)

Related Questions