Reputation: 21961
I have the foll. strings in python:
Vladimir_SW_crop_mask_ERA.hdr
Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr
Ingush_WW_crop_mask.dat
I want to parse these strings such that:
Get the crop type which can be either SW or WW
Get the region name which is the text preceding _SW or _WW
I was doing str.split('_')[0] to get region name, but that fails in case of Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr
, where the region name is Ust_Ordynskiy_Buryatskiy_AO
Upvotes: 0
Views: 46
Reputation: 120588
You can partition and rpartition to do this:
>>> s = 'Vladimir_SW_crop_mask_ERA.hdr'
>>> s.partition('_crop')[0].rpartition('_')[::2]
('Vladimir', 'SW')
>>> s = 'Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr'
>>> s.partition('_crop')[0].rpartition('_')[::2]
('Ust_Ordynskiy_Buryatskiy_AO', 'SW')
Upvotes: 2
Reputation: 3947
The following regexp should work:
(.*)_(SW|WW)
Match everything up to an underscore followed by either SW
or WW
and put this in the first matching group and the following SW
or WW
in the second group:
import re
strs = ["Vladimir_SW_crop_mask_ERA.hdr",
"Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr",
"Ingush_WW_crop_mask.dat"]
for s in strs:
print(re.match("(.*)_(SW|WW)", s).groups())
Result:
('Vladimir', 'SW')
('Ust_Ordynskiy_Buryatskiy_AO', 'SW')
('Ingush', 'WW')
Upvotes: 1