Reputation: 5723
I have some filename that contain some redundant words that I want to get rid of like: VIS
, THE
etc.
I was this regex but the problem is that the words to be removed can appear in the front or in the back of the filename. To make it clearer some samples of filenames are:
filenames = ['a_VIS-MarnehNew_24RGB_1110.jpg',
'Marne_04_Vis.jpg',
'VIS_jeep_smoke.jpg',
'IR_fk_ref_01_005.jpg',
'c_LWIR-MarnehNew_24RGB_1110.jpg',
'LWIR-MarnehNew_15RGB_603.jpg',
'Movie_01_IR.jpg',
'THE_fk_ge_03_005.jpg']
And the redundant words are VIS
, Vis
, IR
, LWIR
, THE
and every character before them if they appear at the front or every character after them if they appear at the back.
Correct examples would be:
filenames = ['MarnehNew_24RGB_1110',
'Marne_04',
'jeep_smoke',
'fk_ref_01_005',
'MarnehNew_24RGB_1110',
'MarnehNew_15RGB_603',
'Movie_01',
'fk_ge_03_005']
I tried this code but (obviously it's insufficient for the back cases:
import re
pattern = re.compile('(?:VIS|Vis|IR|LWIR)(?:-|_)(\w+)')
for i, filename in enumerate(filenames):
matches = re.search(pattern, filename)
if matches:
print(i, matches.group(1))
0 MarnehNew_24RGB_1110
2 jeep_smoke
3 fk_ref_01_005
4 MarnehNew_24RGB_1110
5 MarnehNew_15RGB_603
So, how do I manage to also get rid of the back words also?
Upvotes: 1
Views: 94
Reputation: 43199
Using your examples you could use
(?:^(?:\w_)?(?:VIS|Vis|IR|LWIR|THE)[-_]?)
|
(?:_?(?:VIS|Vis|IR|LWIR))?\.jpg$
Which needs to be replaced by nothing, see a demo on regex101.com.
(?: # non-capturing group
^ # anchor at the beginning of a string
(?:\w_)? # \w_ optional
(?:VIS|Vis|IR|LWIR|THE) # one of ...
[-_]? # - or _ optional
)
| # OR
(?:
_?
(?:VIS|Vis|IR|LWIR)
)?
\.jpg$
Upvotes: 1