Reputation: 87
I have a text that contains a lot of brackets with text between it. For removing these brackets (and text) I wrote this:
import re
def generalDatacleaning(mystring):
result = re.sub(r'[]', '', mystring)
print(result)
Running this on a sample sentence however gives me "ete" (the insides of the bracket):
test = "[ete], this is a text"
generalDatacleaning(test)
What should I change so the [text] part is removed?
Upvotes: 1
Views: 844
Reputation: 26900
This works:
re.sub(r"\[[^]]*\]", "", test)
Starts with the opening bracket, takes everything within it that is not a closing bracket, takes the closing bracket. Replace it all with an empty string.
Much more efficient than .*? (doesn't require what's called "Backtracking") and works with newlines within the brackets too.
Upvotes: 1