dhdz
dhdz

Reputation: 899

Regex : Find any string wrapped in square brackets

I'm struggling to come up with a regular expression that finds any string that is wrapped in square brackets. I have a massive document scraped from wikipedia containing many paragraphs like this:

"Theology translates into English from the Greek theologia (θεολογία) which derived from Τheos (Θεός), meaning "God," and -logia (-λογία),**[12]** meaning "utterances, sayings, or oracles" (a word related to logos **[λόγος][Citation needed]**".

The desired result would be:

"Theology translates into English from the Greek theologia (θεολογία) which derived from Τheos (Θεός), meaning "God," and -logia (-λογία), meaning "utterances, sayings, or oracles" (a word related to logos".

Any suggestions would be greatly appreciated. Thanks!

Upvotes: 1

Views: 1900

Answers (1)

Are those ** in the document too? You only mention square brackets.

If they are, the regex would be

(\*\*\[.*?\]\*\*)

If it's really just the square brackets, then this matches what you're after:

(\[.*?\])

You don't mention a language, but in Python this would be accomplished by

import re

my_re = r'(\*\*\[.*?\]\*\*)'
my_string = '"Theology translates into English from the Greek theologia (θεολογία) which derived from Τheos (Θεός), meaning "God," and -logia (-λογία),**[12]** meaning "utterances, sayings, or oracles" (a word related to logos **[λόγος][Citation needed]**".'
my_corrected_string = re.sub(my_re, '', my_string)

print(my_corrected_string)

Upvotes: 5

Related Questions