How to use an algorithm to make the white list function more effective?

Question

I have designed a white list function to filter file pathes in windows. There are three types of patterns to filter:

filter pathes according to suffix, such as all txt files.
filter pathes from left, for example, filter all pathes which start with "C:\Windows\System32"
filter pathes which contain a special words, for example, filter all pathes which contain "system".

The patterns are saved in the format:

patternList = [{'type': 'suffix', 'content':'\.txt'},
            {'type': 'keyword', 'content':'system'},
            {'type': 'left', 'content': 'C:\Windows\System32'}]

every dict is a pattern, and all patterns are in a list called patternList.

Then, I have another list called pathInfoObjectList which contain many objects, each objects has an attribute called "filelist", which is a list. In the filelist, there are some file pathes.

Now, I want to use the pattern to delete every path in filelist.

My method is to change the pattern to regex to finish the work.

My codes is here:

patternRegexList = []
for each in patternList:
    if each['type'] == 'suffix':
        patternRegex = '.*?' + each['content'] + '$'
    elif each['type'] == 'keyword':
        patternRegex = '.*?' + each['content'] + '.*?'
    elif each['type'] == 'left':
        patternRegex = '^' + each['content'] + '.*?'
    patternRegexList.append(patternRegex)


for pathInfoObject in pathInfoObjectList:
    for path in pathInfoObject.filelist[:]:
        for patternRegex in patternRegexList:
            if re.match(patternRegex, path):
                pathInfoObject.filelist.remove(path)
                break

But I think my algorithm is so stupid, and it is $O(n^{3})$ .

Do you have a smart way to finish the task?

As now I have found the lacking of the knowledge of algorithm makes my codes ineffective, do you have some suggestions for me to learn algorithm better? I think learning by reading Introduction to algorithms is too slow. Is there more effective way to learn?

Julien Palard · Accepted Answer

It look like it's more like a blacklist than a whitelist, but if I get it wrong, it's easy to fix it.

I tried to express your rules in a clearer and more flexible manner first. I tried to avoid using useless regex too, they probably cost you a lot of time. Finally by using any I avoid testing each exclusion rule when the first has matched. Using a continue in your for loop has the same effect.

exclusion_rules = [
    lambda path: path.endswith('.txt'),
    lambda path: 'system' in path,
    lambda path: path.startswith(r'c:\Windows\System32')]

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.filelist = filter(
        lambda path: not any(rule(path) for rule in exclusion_rules),
        pathInfoObject.filelist)

Another way to do so with list comprehehension instead of filter:

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.filelist = [path for path in pathInfoObject.filelist if
                               not any(rule(path) for rule in exclusion_rules)]

How to use an algorithm to make the white list function more effective?

Answers (2)

Related Questions