Mike Furlender
Mike Furlender

Reputation: 4019

Python: iterating by match to regex

I am making a script to automatically parse some text data (with a complex structure) and insert it into a MySQL database.

I would like to have multiple for loops that iterate over a list of files based on regex matches to the file names. In the end I will concatenate them and insert them into the database.

Here are my regex expressions:

Trgx= re.compile('([a-zA-Z0-9]{3,4})_.*_.*_.*$');
Dtrgx= re.compile('[a-zA-Z0-9]{3,4}_[a-zA-Z0-9]{3,4}_([0-9]{10})_[0-9]{3}'); 
Mrgx= re.compile('.*_([a-zA-Z0-9]{3,4})_.*$'); 
Hrgx= re.compile('.*([0-9]{3}).csv$'); 

My filenames look like this:

ecd_cdd_2012102100_000.csv
ecd_cdd_2012102100_024.csv
ecd_hdd_2012102200_000.csv
ecd_hdd_2012102200_024.csv
ecd_hdd_2012102200_048.csv
ecd_avgt_2012102200_120.csv
ecd_avgt_2012102200_144.csv
ecd_avgt_2012102200_168.csv
ecd_mint_2012102200_192.csv
ecd_maxt_2012102200_144.csv
ecd_maxt_2012102200_168.csv
ecd_cdd_2012102200_000.csv
ecd_cdd_2012102200_024.csv

Each expression captures a subset of the file name:

Every file name will match every regular expression, but .group(1) will be populated by different values.

I want to iterate through the files using the regex exprssions as "grouping" elements, so that I concatenate them together in the right order.

Something like this:

for fileName in fileNameList
    for each distinct value in  Trgx.group(1)
         for each distinct value in  Dtrgx.group(1)
              for each distinct value in Hrgx.group(1)
                     do whatever

Upvotes: 0

Views: 1027

Answers (1)

nneonneo
nneonneo

Reputation: 179442

It may be easier to combine the regexes together

re_fn = re.compile('(?P<T>[a-zA-Z0-9]{3,4})_(?P<M>[a-zA-Z0-9]{3,4})_(?P<Dt>[0-9]{10})_(?P<H>[0-9]{3}).csv')

and save yourself the trouble of maintaining four regexes.

Then you can just do

groups = re_fn.match(fileName).groupdict()
# do stuff with groups['T'], groups['M'], groups['Dt'], groups['H']

Upvotes: 2

Related Questions