Regex to select and replace spaces inside double brackets

Question

I'm writing a script which is used to tidy up MediaWiki files prior to conversion to confluence mark-up, this particular scenario I'm needing to fix page links which in MediaWiki are something like this

[[this is a page]]

the problem being that the actual page link would be this_is_a_page, the universal wiki converter isn't smart enough to realise this when it converts to confluence mark-up so you end up with broken links.

I've been trying to create a regex as part of my python script (I've already stripped out html and some other tags like < gallery> etc., the following regex selects all the links in question:

'$$\[(.*?)$$\]'

I just cant find a programmatic way to select only the spaces inside the [[ ]] so I can substitute them out for underscores. I've attempted using matches with no success.

Avinash Raj · Accepted Answer

Try the below regex and replace the matched spaces with underscores.

\s(?=[^]*]])

DEMO

>>> import re
>>> s = " [[this is a page]]    goo hghg"
>>> m = re.sub(r'\s(?=[^]*]])', "_", s)
>>> m
' [[this_is_a_page]]    goo hghg'

\s(?=[^]*]], it would match the spaces only if it's followed by any character not of [ or ] zero or more times and the two closing ]] brackets.

Regex to select and replace spaces inside double brackets

Answers (2)

Related Questions