user308827
user308827

Reputation: 21961

Regular expression in python with variable strings

I have the foll. strings in python:

Vladimir_SW_crop_mask_ERA.hdr
Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr
Ingush_WW_crop_mask.dat

I want to parse these strings such that:

  1. Get the crop type which can be either SW or WW

  2. Get the region name which is the text preceding _SW or _WW

I was doing str.split('_')[0] to get region name, but that fails in case of Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr, where the region name is Ust_Ordynskiy_Buryatskiy_AO

Upvotes: 0

Views: 46

Answers (2)

ekhumoro
ekhumoro

Reputation: 120588

You can partition and rpartition to do this:

>>> s = 'Vladimir_SW_crop_mask_ERA.hdr'
>>> s.partition('_crop')[0].rpartition('_')[::2]
('Vladimir', 'SW')
>>> s = 'Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr'
>>> s.partition('_crop')[0].rpartition('_')[::2]
('Ust_Ordynskiy_Buryatskiy_AO', 'SW')

Upvotes: 2

Jasper
Jasper

Reputation: 3947

The following regexp should work:

(.*)_(SW|WW)

Match everything up to an underscore followed by either SW or WW and put this in the first matching group and the following SW or WW in the second group:

import re

strs = ["Vladimir_SW_crop_mask_ERA.hdr",
        "Ust_Ordynskiy_Buryatskiy_AO_SW_crop_mask_ERA.hdr",
        "Ingush_WW_crop_mask.dat"]

for s in strs:
    print(re.match("(.*)_(SW|WW)", s).groups())

Result:

('Vladimir', 'SW')
('Ust_Ordynskiy_Buryatskiy_AO', 'SW')
('Ingush', 'WW')

Upvotes: 1

Related Questions