Reputation: 164
I'm writing a reader for files by their name. Write regex mask:
def getmatches(datafilelist, regex=None):
"""Takes list of search strings + regex. Returns a list of match objects"""
if not regex:
regex = re.compile(
r"""
(?P<ftype>[A-Z0-9]{5}) # band type of data file
_[a-z]+ # sat id
_d(?P<date>\d{8}) # acq date
_t(?P<time>\d{7}) # granule start time UTC
_e\d+ # granule end time UTC
_b(?P<orbit>\d+) # orbit number
_c\d+ # file creation date/time
_\w+.h5 # more stuff
""", re.X)
return [regex.search(filename) for filename in datafilelist]
File Name: SVI01_j01_d20191004_t0717193_e0730075_b00001_c20191004083048126000_ipop_dev.h5
What's wrong?
Upvotes: 0
Views: 285
Reputation: 27733
It is just missing a few small things:
(?P<ftype>[A-Z0-9]{5})_[a-z0-9]+_d(?P<date>\d{8})_t(?P<time>\d{7})_e\d+_b(?P<orbit>\d+)_c\d+_\w+\.h5
import re
regex = r'(?P<ftype>[A-Z0-9]{5})_[a-z0-9]+_d(?P<date>\d{8})_t(?P<time>\d{7})_e\d+_b(?P<orbit>\d+)_c\d+_\w+\.h5'
string = '''
SVI01_j01_d20191004_t0717193_e0730075_b00001_c20191004083048126000_ipop_dev.h5
'''
print(re.findall(regex, string))
[('SVI01', '20191004', '0717193', '00001')]
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Upvotes: 1