Reputation: 597
I have 3 file in /some/dir
:
$ ls /some/dir
fiot_csv2apex_nomuratest.xml fiot_csv2apex_nomurauat.xml fiot_csv2apex_nomura.xml
I want my script to extract only the file that does NOT contain substrings "uat"
or "test"
in its filename.
To start off simply, I'm only trying to exclude the "uat"
substring but my attempts fail.
Here is the entire script that does NOT try to exclude any of those 3 files:
#!/usr/bin/env python
import xml.etree.ElementTree as ET, sys, os, re, fnmatch
param = sys.argv[1]
client = param.split('_')[0]
market = param.split('_')[1]
suffix = param.split('_')[2]
toapex_pattern = market + '*2apex*' + client + '*' + '.xml'
files_dir = '/some/dir'
config_files = os.listdir(files_dir)
for f in config_files:
if fnmatch.fnmatch(f, toapex_pattern):
print(f)
The above script will output all the 3 files in /some/dir
as expected. The script is being run like this:
python /test/scripts/regex.py nomura_fiot_b
I attempted to exclude "uat"
by modifying toapex_pattern
variable like this:
toapex_pattern = market + '*2apex*' + client + '(?!uat)' + '*' + '.xml':
However, after that the script did not produce any output.
I also tried this:
toapex_pattern = re.compile(market + '*2apex*' + client + '(?!uat)' + '*' + '.xml')
But this resulted in a type error:
TypeError: object of type '_sre.SRE_Pattern' has no len()
And if I try this:
toapex_pattern = market + '*2apex*' + client + '[^uat]' + '*' + '.xml'
the output is:
fiot_csv2apex_nomuratest.xml
fiot_csv2apex_nomurauat.xml
The desired output is:
fiot_csv2apex_nomura.xml
How should I modify the toapex_pattern
variable to achieve the desired output?
Upvotes: 0
Views: 945
Reputation: 338158
An fnmatch
pattern is not a regular expression. Things like (?!...)
won't work.
Generally, exclusive patterns will not work well with fnmatch
. You can to something like this
[!u][!a][!t]
to match any three letters that are not "uat"... but that would still mean you'd implicitly require at least 3 letters, and you could not control any further which ones.
Spare yourself the hassle, use fnmatch
to get into the general ballpark, and then use a second step to exclude things you don't want.
files_dir = '/some/dir'
config_files = os.listdir(files_dir)
for file_name in config_files:
if fnmatch.fnmatch(file_name, toapex_pattern) and not "uat" in file_name:
print(file_name)
Alternatively, use regex from the start.
import re
files_dir = '/some/dir'
config_files = os.listdir(files_dir)
# ...
toapex_pattern = re.escape(market) + '.*2apex.*' + re.escape(client) + '(?!uat).*\\.xml$':
for file_name in config_files:
if re.match(toapex_pattern, file_name):
print(file_name)
Just throwing it in, you could call the script as python /test/scripts/regex.py nomura fiot b
and use sys.argv[1]
, sys.argv[2]
and sys.argv[3]
directly, without having to split anything yourself first.
Upvotes: 1