user3222101
user3222101

Reputation: 1330

how to replace multiple consecutive repeating characters into 1 character in python?

I have a string in python and I want to replace multiple consecutive repeating character into 1. For example:

st = "UUUURRGGGEENNTTT"
print(st.replace(r'(\w){2,}',r'\1'))

But this command doesn't seems to be working, please can anybody help in finding what's wrong with this command?

There is one more way to solve this but wanted to understand why the above command fails and is there any way to correct it:

print(re.sub(r"([a-z])\1+",r"\1",st)) -- print URGENT

Upvotes: 1

Views: 6038

Answers (3)

HSLM
HSLM

Reputation: 2012

you need to use regex. so you can do this:

import re

re.sub(r'[^\w\s]|(.)(?=\1)', '', 'UUURRRUU')

the result is UR.

this is a snapshot of what I have got:

enter image description here

for this regex: (.)(?=.*\1)

(.) means: match any char except new lines (line breaks)
?=. means: lookahead every char except new line (.) 
* means: match a preceding token
\1 means: to mach the result of captured group, which is the U or R ...

then replace all matches with ''

also you can check this: lookahead

also check this tool I solve my regex using it, it describe everything and you can learn a lot from it: regexer

Upvotes: 10

Taku
Taku

Reputation: 33704

The reason for why your code does not work is because str.replace does not support regex, you can only replace a substring with another string. You will need to use the re module if you want to replace by matching a regex pattern.

Secondly, your regex pattern is also incorrect, (\w){2,} will match any characters that occurs 2 or more times (doesn’t have to be the same character though), so it will not work. You will need to do something like this:

import re
st = "UUUURRGGGEENNTTT"
print(re.sub(r'(\w)\1+',r'\1', st)))
# URGENT

Now this will only match the same character 2 or more times.

An alternative, “unique” solution to this is that you can use the unique_justseen recipe that itertools provides:

from itertools import groupby
from operator import itemgetter

st = "UUUURRGGGEENNTTT"
new ="".join(map(next, map(itemgetter(1), groupby(st))))

print(new)
# URGENT

Upvotes: 3

CtheSky
CtheSky

Reputation: 2624

string.replace(s, old, new[, maxreplace]) only does substring replacement:

>>> '(\w){2,}'.replace(r'(\w){2,}',r'\1') 
'\\1'

That's why it fails and it can't work with regex expression so no way to correct the first command.

Upvotes: 1

Related Questions