Reputation: 485
This might be a simple one :) I am try to turn convert the following:
<gallery>File:ReDescribe.jpg|Photo by:J. K.File:redescribe_still1.pngFile:redescribe_still2.jpegFile:redescribe_still3.jpgFile:redescribe_still4.jpgFile:redescribe_still5.jpg</gallery>
into:
[[File:ReDescribe.jpg|photo by: J K]][[File:redescribe_still1.png]] [[File:redescribe_still2.jpeg]] [[File:redescribe_still3.jpg]] [[File:redescribe_still4.jpg]] [[File:redescribe_still5.jpg]]
And to start with I am looking for a Python regex that can selects only each File:filename.ext
So far I though of 'File:(.*?)File'
but this expression excludes the last File: since it is not followed any character.
See it regex_tester https://regex101.com/r/iV1mD9/1
How could the expression also match the last File: which is followed by </gallery>
?
Upvotes: 2
Views: 63
Reputation: 174696
First remove the gallery
tag and then apply the below positive lookahead based regex.
>>> s = '''<gallery>File:ReDescribe.jpg|Photo by:J. K.File:redescribe_still1.pngFile:redescribe_still2.jpegFile:redescribe_still3.jpgFile:redescribe_still4.jpgFile:redescribe_still5.jpg</gallery>'''
>>> re.sub(r'(File:.+?)(?=File:|$)', r'[[\1]]', re.sub(r'</?gallery>', '', s))
'[[File:ReDescribe.jpg|Photo by:J. K.]][[File:redescribe_still1.png]][[File:redescribe_still2.jpeg]][[File:redescribe_still3.jpg]][[File:redescribe_still4.jpg]][[File:redescribe_still5.jpg]]'
Upvotes: 1
Reputation: 67968
File:(.*?)(?=File:|<\/gallery>)
Try this.See demo.Use lookahead
to make sure last File:
is also captured.
https://regex101.com/r/sJ9gM7/94#python
Upvotes: 1