Mike
Mike

Reputation: 126

Regex to select and replace spaces inside double brackets

I'm writing a script which is used to tidy up MediaWiki files prior to conversion to confluence mark-up, this particular scenario I'm needing to fix page links which in MediaWiki are something like this

[[this is a page]] 

the problem being that the actual page link would be this_is_a_page, the universal wiki converter isn't smart enough to realise this when it converts to confluence mark-up so you end up with broken links.

I've been trying to create a regex as part of my python script (I've already stripped out html and some other tags like < gallery> etc., the following regex selects all the links in question:

'\[\[(.*?)\]\]'

I just cant find a programmatic way to select only the spaces inside the [[ ]] so I can substitute them out for underscores. I've attempted using matches with no success.

Upvotes: 2

Views: 1434

Answers (2)

xecgr
xecgr

Reputation: 5193

Try with re.sub and lambda expression

>>> import re
>>> test = '[[this is a page]] bla bla [[this is another page]]'
>>> re.sub(r'\[\[.+?\]\]', lambda x:x.group().replace(" ","_"), test)
'[[this_is_a_page]] bla bla [[this_is_another_page]]'

Upvotes: 3

Avinash Raj
Avinash Raj

Reputation: 174706

Try the below regex and replace the matched spaces with underscores.

\s(?=[^\[\]]*]])

DEMO

>>> import re
>>> s = " [[this is a page]]    goo hghg"
>>> m = re.sub(r'\s(?=[^\[\]]*]])', "_", s)
>>> m
' [[this_is_a_page]]    goo hghg'

\s(?=[^\[\]]*]], it would match the spaces only if it's followed by any character not of [ or ] zero or more times and the two closing ]] brackets.

Upvotes: 3

Related Questions