How do I remove multiple occurrences of a pattern from a string in Python?

Question

I am interested in removing all occurrences of a pattern in a Python string where the pattern looks like "start-string blah, blah, blah end-string". This is a general problem I'd like to be able to handle. This is the same problem as How can I remove a portion of text from a string whenever it starts with &*( and ends with )(* but in Python and not Java.

How would I solve the same problem in Python?

Assume the string looks like this,

'Bla bla bla  bla bla bla. Yadda yadda yadda  yadda.'

The start of the block to remove is and the end is />. So I do the following:



import re
mystring = "Bla bla bla  bla bla bla. Yadda yadda yadda  yadda."
tags = ""
re.sub('%s.*%s' % tags, '', mystring)


My desired output is

'Bla bla bla  bla bla bla. Yadda yadda yadda  yadda.'


But what I get is

'Bla bla bla  yadda.'


So clearly the command is using the first instance of the opening string and the last occurrence of the end string.

How do I make it match the pattern twice and give me the desired output? This has to be easy but despite searches on "remove multiple occurrences regex Python" and the like I have not found an answer. Thanks.

Cory Kramer · Accepted Answer

You basically want to find anything between ' and '/>' so you start with the pattern



r''


However the .* will be greedy, so to make it non-greedy you need to add a ?, then simply use re.sub to replace those matches with empty string

>>> re.sub(r'', '', s)
'Bla bla bla  bla bla bla. Yadda yadda yadda  yadda.'

How do I remove multiple occurrences of a pattern from a string in Python?

Answers (1)

Related Questions