MrCastro
MrCastro

Reputation: 485

Python regex difficulties

This might be a simple one :) I am try to turn convert the following:

<gallery>File:ReDescribe.jpg|Photo by:J. K.File:redescribe_still1.pngFile:redescribe_still2.jpegFile:redescribe_still3.jpgFile:redescribe_still4.jpgFile:redescribe_still5.jpg</gallery>

into:

[[File:ReDescribe.jpg|photo by: J K]][[File:redescribe_still1.png]] [[File:redescribe_still2.jpeg]] [[File:redescribe_still3.jpg]] [[File:redescribe_still4.jpg]] [[File:redescribe_still5.jpg]]

And to start with I am looking for a Python regex that can selects only each File:filename.ext

So far I though of 'File:(.*?)File' but this expression excludes the last File: since it is not followed any character. See it regex_tester https://regex101.com/r/iV1mD9/1

How could the expression also match the last File: which is followed by </gallery>?

Upvotes: 2

Views: 63

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174696

First remove the gallery tag and then apply the below positive lookahead based regex.

>>> s = '''<gallery>File:ReDescribe.jpg|Photo by:J. K.File:redescribe_still1.pngFile:redescribe_still2.jpegFile:redescribe_still3.jpgFile:redescribe_still4.jpgFile:redescribe_still5.jpg</gallery>'''
>>> re.sub(r'(File:.+?)(?=File:|$)', r'[[\1]]', re.sub(r'</?gallery>', '', s))
'[[File:ReDescribe.jpg|Photo by:J. K.]][[File:redescribe_still1.png]][[File:redescribe_still2.jpeg]][[File:redescribe_still3.jpg]][[File:redescribe_still4.jpg]][[File:redescribe_still5.jpg]]'

Upvotes: 1

vks
vks

Reputation: 67968

File:(.*?)(?=File:|<\/gallery>)

Try this.See demo.Use lookahead to make sure last File: is also captured.

https://regex101.com/r/sJ9gM7/94#python

Upvotes: 1

Related Questions