Reputation: 899
I'm struggling to come up with a regular expression that finds any string that is wrapped in square brackets. I have a massive document scraped from wikipedia containing many paragraphs like this:
"Theology translates into English from the Greek theologia (θεολογία) which derived from Τheos (Θεός), meaning "God," and -logia (-λογία),**[12]** meaning "utterances, sayings, or oracles" (a word related to logos **[λόγος][Citation needed]**".
The desired result would be:
"Theology translates into English from the Greek theologia (θεολογία) which derived from Τheos (Θεός), meaning "God," and -logia (-λογία), meaning "utterances, sayings, or oracles" (a word related to logos".
Any suggestions would be greatly appreciated. Thanks!
Upvotes: 1
Views: 1900
Reputation: 3002
Are those **
in the document too? You only mention square brackets.
If they are, the regex would be
(\*\*\[.*?\]\*\*)
If it's really just the square brackets, then this matches what you're after:
(\[.*?\])
You don't mention a language, but in Python this would be accomplished by
import re
my_re = r'(\*\*\[.*?\]\*\*)'
my_string = '"Theology translates into English from the Greek theologia (θεολογία) which derived from Τheos (Θεός), meaning "God," and -logia (-λογία),**[12]** meaning "utterances, sayings, or oracles" (a word related to logos **[λόγος][Citation needed]**".'
my_corrected_string = re.sub(my_re, '', my_string)
print(my_corrected_string)
Upvotes: 5