Reputation: 632
I have a list of characters that I want to find in a string and replace its multiple occurances together into just one occurance.
But I am facing 2 problems - when i loop over them, the re.sub function does not replace the multiple occurances and when i have a smiley like :) it replaces ':' with ':)' which I dont want.
Here is the code that I tried.
end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
pattern = "[" + i + "]" + "+"
str = re.sub(pattern,i,str)
If I take a single character and try it works as shown below.
str = re.sub("[.]+",".",str)
But looping over a list of characters gives error. How to solve these 2 problems? Thanks for the help.
Upvotes: 0
Views: 1760
Reputation: 26022
re.escape(str)
does the escaping for you. Separated with |
you can match alternatives. With (?:…)
you do grouping without capturing. So:
# Only in Python2:
from itertools import imap as map, ifilter as filter
# Escape all elements for, e.g. ':-)' → r'\:\-\)':
esc = map(re.escape, end_of_line_chars)
# Wrap elements in capturing as group, so you know what element what found,
# and in a non-capturing group with repeats and optional trailing spaces:
esc = map(r'(?:({})\s*)+'.format, esc)
# Compile expressing what finds any of these elements:
esc = re.compile('|'.join(esc))
# The function to turn a match of repeats into a single item:
def replace_with_one(match):
# match.groups() has captures, where only the found one is truthy: ()
# e.g. (None, None, None, None, ':-)', None, None, None, None, None, None, None, None, None, None, None)
return next(filter(bool, match.groups()))
# This is how you use it:
esc.sub(replace_with_one, '.... :-) :-) :-) :-( .....')
# Returns: '.:-):-(.'
Upvotes: 1
Reputation: 155323
If the things to replace are not single characters, character classes won't work. Instead, use non-capture groups (and use re.escape
so the literals aren't interpreted as regex special characters):
end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
pattern = r"(?:{})+".format(re.escape(i))
str = re.sub(pattern,i,str)
Upvotes: 0