Edward de Goeij
Edward de Goeij

Reputation: 87

Remove brackets (and text within brackets) from a sentence

I have a text that contains a lot of brackets with text between it. For removing these brackets (and text) I wrote this:

import re
def generalDatacleaning(mystring):
  result = re.sub(r'[]', '', mystring)
  print(result)

Running this on a sample sentence however gives me "ete" (the insides of the bracket):

test = "[ete], this is a text"
generalDatacleaning(test)

What should I change so the [text] part is removed?

Upvotes: 1

Views: 844

Answers (1)

Bharel
Bharel

Reputation: 26900

This works:

re.sub(r"\[[^]]*\]", "", test)

Starts with the opening bracket, takes everything within it that is not a closing bracket, takes the closing bracket. Replace it all with an empty string.

Much more efficient than .*? (doesn't require what's called "Backtracking") and works with newlines within the brackets too.

Upvotes: 1

Related Questions