Reputation: 1330
I have a string in python and I want to replace multiple consecutive repeating character into 1. For example:
st = "UUUURRGGGEENNTTT"
print(st.replace(r'(\w){2,}',r'\1'))
But this command doesn't seems to be working, please can anybody help in finding what's wrong with this command?
There is one more way to solve this but wanted to understand why the above command fails and is there any way to correct it:
print(re.sub(r"([a-z])\1+",r"\1",st)) -- print URGENT
Upvotes: 1
Views: 6038
Reputation: 2012
you need to use regex. so you can do this:
import re
re.sub(r'[^\w\s]|(.)(?=\1)', '', 'UUURRRUU')
the result is UR.
this is a snapshot of what I have got:
for this regex: (.)(?=.*\1)
(.) means: match any char except new lines (line breaks)
?=. means: lookahead every char except new line (.)
* means: match a preceding token
\1 means: to mach the result of captured group, which is the U or R ...
then replace all matches with ''
also you can check this: lookahead
also check this tool I solve my regex using it, it describe everything and you can learn a lot from it: regexer
Upvotes: 10
Reputation: 33704
The reason for why your code does not work is because str.replace
does not support regex, you can only replace a substring with another string. You will need to use the re
module if you want to replace by matching a regex pattern.
Secondly, your regex pattern is also incorrect, (\w){2,}
will match any characters that occurs 2 or more times (doesn’t have to be the same character though), so it will not work. You will need to do something like this:
import re
st = "UUUURRGGGEENNTTT"
print(re.sub(r'(\w)\1+',r'\1', st)))
# URGENT
Now this will only match the same character 2 or more times.
An alternative, “unique” solution to this is that you can use the unique_justseen
recipe that itertools
provides:
from itertools import groupby
from operator import itemgetter
st = "UUUURRGGGEENNTTT"
new ="".join(map(next, map(itemgetter(1), groupby(st))))
print(new)
# URGENT
Upvotes: 3
Reputation: 2624
string.replace(s, old, new[, maxreplace])
only does substring replacement:
>>> '(\w){2,}'.replace(r'(\w){2,}',r'\1')
'\\1'
That's why it fails and it can't work with regex expression so no way to correct the first command.
Upvotes: 1