TIMEX
TIMEX

Reputation: 272094

How do I do this replace regex in python?

Given a string of text, in Python:

s = "(((((hi abc )))))))"
s = "***(((((hi abc ***&&&&"

How do I replace all non-alphabetic symbols that occur more than 3 times...as blank string

For all the above, the result should be:

hi abc

Upvotes: 3

Views: 282

Answers (3)

John Machin
John Machin

Reputation: 82992

You can't (easily, using regexes) replace that by a "blank string" that's the same length as the replaced text. You can replace it with an empty string "" or a single space " " or any other constant string of your choice; I've used "*" in the example so that it is easier to see what is happening.

>>> re.sub(r"(\W)\1{3,}", "*", "12345<><>aaaaa%%%11111<<<<..>>>>")
'12345<><>aaaaa%%%11111*..*'
>>>

Note carefully: it doesn't change "<><>" ... I'm assuming that "non-alphabetic symbols that occur more than 3 times" means the same symbol has to occur more than 3 times". I'm also assuming that you did mean "more than 3" and not "3 or more".

Upvotes: 0

Alex Martelli
Alex Martelli

Reputation: 882133

If you want to replace any sequence of non-space non-alphamerics (e.g. '!?&' as well as your examples), @Stephen's answer is fine. But if you only want to replace sequences of three or more identical non-alphamerics, a backreference will help:

>>> r3 = re.compile(r'(([^\s\w])\2{2,})')
>>> r3.findall('&&&xxx!&?yyy*****')
[('&&&', '&'), ('*****', '*')]

So, for example:

>>> r3.sub('', '&&&xxx!&?yyy*****')
'xxx!&?yyy'

Upvotes: 4

Stephen
Stephen

Reputation: 49206

This should work: \W{3,}: matching non-alphanumerics that occur 3 or more times:

>>> s = "***(((((hi abc ***&&&&"
>>> re.sub("\W{3,}", "", s) 
'hi abc'
>>> s = "(((((hi abc )))))))"
>>> re.sub("\W{3,}", "", s) 
'hi abc'

Upvotes: 8

Related Questions