Reputation: 225
Is there a way to reduce a duplicated characters to specific number, for example if we have this string.
"I liiiiked it, thaaaaaaank you"
Expected output: "I liiiiked it thaaaank you"
so if the duplicated character over 4, for example, it should be reduced to only four characters and if it less than or equal 4 then the word should stays the same.
Upvotes: 4
Views: 2179
Reputation: 304215
>>> import re
>>> s="I liiiiked it, thaaaaaaank you"
>>> re.sub(r"(.)(\1{3})(\1+)", r"\1\2", s)
'I liiiiked it, thaaaank you'
This regular expression looks for 3 groups.
The first is any character. The second is 3 more of that same character, and the third is one or more of the first character.
Those 3 groups are then replaced by just group 1 and group 2
Here is an even simpler method
>>> re.sub(r"(.)\1{4,}", r"\1"*4, s)
'I liiiiked it, thaaaank you'
This time there is just one group (.)
, which is the first letter of the match. This must be followed by the same letter 4 or more times \1{4,}
. So it matches 5 or more of the same letter. The replacement is just that letter repeated 4 times.
Upvotes: 12
Reputation: 60080
You can do this with a single scan through the input string, just keep a count of the current character and don't add it to the output if you've got too many repeats:
input_string = "I liiiiked it, thaaaaaaank you"
max_reps = 4
prev_char = None
rep_count = 0
output = ""
for char in input_string:
if not char == prev_char:
rep_count = 1
prev_char = char
output += char
else:
if rep_count < max_reps:
rep_count += 1
output += char
else:
rep_count += 1
A version that's possibly faster by avoiding string concatenation (see this question):
input_string = "I liiiiked it, thaaaaaaank you"
max_reps = 4
prev_char = None
rep_count = 0
output_list = []
for char in input_string:
if not char == prev_char:
rep_count = 1
prev_char = char
output_list.append(char)
else:
if rep_count < max_reps:
rep_count += 1
output_list.append(char)
else:
rep_count += 1
output = ''.join(output_list)
Upvotes: 2
Reputation: 22904
Not the best solution - my regex needs to be fixed... I think
import re
def rep(o):
g = o.group(0)
if len(g) > 4:
return g[0:3]
return g
foo = 'iiiiiiii liiiiiiikkkkkkkkkeeeee fooooooddd'
foo1 = re.sub(r'(\w)\1+', rep, foo)
# iiii liiiikkkkeeee fooooddd
You can probably start tinkering with this if you are so inclined.
Upvotes: 1