Denis
Denis

Reputation: 164

Reading files by pattern

I'm writing a reader for files by their name. Write regex mask:


def getmatches(datafilelist, regex=None):
    """Takes list of search strings + regex. Returns a list of match objects"""
    if not regex:
        regex = re.compile(
            r"""
            (?P<ftype>[A-Z0-9]{5})      # band type of data file
            _[a-z]+                     # sat id
            _d(?P<date>\d{8})           # acq date
            _t(?P<time>\d{7})           # granule start time UTC
            _e\d+                       # granule end time UTC
            _b(?P<orbit>\d+)            # orbit number
            _c\d+                       # file creation date/time
            _\w+.h5                     # more stuff
            """, re.X)
    return [regex.search(filename) for filename in datafilelist]

File Name: SVI01_j01_d20191004_t0717193_e0730075_b00001_c20191004083048126000_ipop_dev.h5

What's wrong?

Upvotes: 0

Views: 285

Answers (1)

Emma
Emma

Reputation: 27733

It is just missing a few small things:

(?P<ftype>[A-Z0-9]{5})_[a-z0-9]+_d(?P<date>\d{8})_t(?P<time>\d{7})_e\d+_b(?P<orbit>\d+)_c\d+_\w+\.h5

Test

import re

regex = r'(?P<ftype>[A-Z0-9]{5})_[a-z0-9]+_d(?P<date>\d{8})_t(?P<time>\d{7})_e\d+_b(?P<orbit>\d+)_c\d+_\w+\.h5'
string = '''
SVI01_j01_d20191004_t0717193_e0730075_b00001_c20191004083048126000_ipop_dev.h5
'''

print(re.findall(regex, string))

Output

[('SVI01', '20191004', '0717193', '00001')]

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Upvotes: 1

Related Questions