kamokoba
kamokoba

Reputation: 597

Python 2.6: how to exclude filenames that contain a certain substring?

I have 3 file in /some/dir:

$ ls /some/dir
fiot_csv2apex_nomuratest.xml  fiot_csv2apex_nomurauat.xml  fiot_csv2apex_nomura.xml

I want my script to extract only the file that does NOT contain substrings "uat" or "test" in its filename.

To start off simply, I'm only trying to exclude the "uat" substring but my attempts fail.

Here is the entire script that does NOT try to exclude any of those 3 files:

#!/usr/bin/env python

import xml.etree.ElementTree as ET, sys, os, re, fnmatch

param = sys.argv[1]
client = param.split('_')[0]
market = param.split('_')[1]
suffix = param.split('_')[2]

toapex_pattern = market + '*2apex*' + client + '*' + '.xml'

files_dir = '/some/dir'
config_files = os.listdir(files_dir)

for f in config_files:
    if fnmatch.fnmatch(f, toapex_pattern):
            print(f)

The above script will output all the 3 files in /some/dir as expected. The script is being run like this:

python /test/scripts/regex.py nomura_fiot_b

I attempted to exclude "uat" by modifying toapex_pattern variable like this:

toapex_pattern = market + '*2apex*' + client + '(?!uat)' + '*' + '.xml':

However, after that the script did not produce any output.

I also tried this:

toapex_pattern = re.compile(market + '*2apex*' + client + '(?!uat)' + '*' + '.xml')

But this resulted in a type error:

TypeError: object of type '_sre.SRE_Pattern' has no len()

And if I try this:

toapex_pattern = market + '*2apex*' + client + '[^uat]' + '*' + '.xml'

the output is:

fiot_csv2apex_nomuratest.xml
fiot_csv2apex_nomurauat.xml

The desired output is:

fiot_csv2apex_nomura.xml

How should I modify the toapex_pattern variable to achieve the desired output?

Upvotes: 0

Views: 945

Answers (1)

Tomalak
Tomalak

Reputation: 338158

An fnmatch pattern is not a regular expression. Things like (?!...) won't work.

Generally, exclusive patterns will not work well with fnmatch. You can to something like this

[!u][!a][!t]

to match any three letters that are not "uat"... but that would still mean you'd implicitly require at least 3 letters, and you could not control any further which ones.

Spare yourself the hassle, use fnmatch to get into the general ballpark, and then use a second step to exclude things you don't want.

files_dir = '/some/dir'
config_files = os.listdir(files_dir)

for file_name in config_files:
    if fnmatch.fnmatch(file_name, toapex_pattern) and not "uat" in file_name:
        print(file_name)

Alternatively, use regex from the start.

import re

files_dir = '/some/dir'
config_files = os.listdir(files_dir)

# ...

toapex_pattern = re.escape(market) + '.*2apex.*' + re.escape(client) + '(?!uat).*\\.xml$':

for file_name in config_files:
    if re.match(toapex_pattern, file_name):
        print(file_name)

Just throwing it in, you could call the script as python /test/scripts/regex.py nomura fiot b and use sys.argv[1], sys.argv[2] and sys.argv[3] directly, without having to split anything yourself first.

Upvotes: 1

Related Questions