Reputation: 4019
I am making a script to automatically parse some text data (with a complex structure) and insert it into a MySQL database.
I would like to have multiple for loops that iterate over a list of files based on regex matches to the file names. In the end I will concatenate them and insert them into the database.
Here are my regex expressions:
Trgx= re.compile('([a-zA-Z0-9]{3,4})_.*_.*_.*$');
Dtrgx= re.compile('[a-zA-Z0-9]{3,4}_[a-zA-Z0-9]{3,4}_([0-9]{10})_[0-9]{3}');
Mrgx= re.compile('.*_([a-zA-Z0-9]{3,4})_.*$');
Hrgx= re.compile('.*([0-9]{3}).csv$');
My filenames look like this:
ecd_cdd_2012102100_000.csv
ecd_cdd_2012102100_024.csv
ecd_hdd_2012102200_000.csv
ecd_hdd_2012102200_024.csv
ecd_hdd_2012102200_048.csv
ecd_avgt_2012102200_120.csv
ecd_avgt_2012102200_144.csv
ecd_avgt_2012102200_168.csv
ecd_mint_2012102200_192.csv
ecd_maxt_2012102200_144.csv
ecd_maxt_2012102200_168.csv
ecd_cdd_2012102200_000.csv
ecd_cdd_2012102200_024.csv
Each expression captures a subset of the file name:
Every file name will match every regular expression, but .group(1)
will be populated
by different values.
I want to iterate through the files using the regex exprssions as "grouping" elements, so that I concatenate them together in the right order.
Something like this:
for fileName in fileNameList
for each distinct value in Trgx.group(1)
for each distinct value in Dtrgx.group(1)
for each distinct value in Hrgx.group(1)
do whatever
Upvotes: 0
Views: 1027
Reputation: 179442
It may be easier to combine the regexes together
re_fn = re.compile('(?P<T>[a-zA-Z0-9]{3,4})_(?P<M>[a-zA-Z0-9]{3,4})_(?P<Dt>[0-9]{10})_(?P<H>[0-9]{3}).csv')
and save yourself the trouble of maintaining four regexes.
Then you can just do
groups = re_fn.match(fileName).groupdict()
# do stuff with groups['T'], groups['M'], groups['Dt'], groups['H']
Upvotes: 2