Kingname
Kingname

Reputation: 1362

How to use an algorithm to make the white list function more effective?

I have designed a white list function to filter file pathes in windows. There are three types of patterns to filter:

The patterns are saved in the format:

patternList = [{'type': 'suffix', 'content':'\.txt'},
            {'type': 'keyword', 'content':'system'},
            {'type': 'left', 'content': 'C:\Windows\System32'}]

every dict is a pattern, and all patterns are in a list called patternList.

Then, I have another list called pathInfoObjectList which contain many objects, each objects has an attribute called "filelist", which is a list. In the filelist, there are some file pathes.

Now, I want to use the pattern to delete every path in filelist.

My method is to change the pattern to regex to finish the work.

My codes is here:

patternRegexList = []
for each in patternList:
    if each['type'] == 'suffix':
        patternRegex = '.*?' + each['content'] + '$'
    elif each['type'] == 'keyword':
        patternRegex = '.*?' + each['content'] + '.*?'
    elif each['type'] == 'left':
        patternRegex = '^' + each['content'] + '.*?'
    patternRegexList.append(patternRegex)


for pathInfoObject in pathInfoObjectList:
    for path in pathInfoObject.filelist[:]:
        for patternRegex in patternRegexList:
            if re.match(patternRegex, path):
                pathInfoObject.filelist.remove(path)
                break

But I think my algorithm is so stupid, and it is O(n^{3}).

Do you have a smart way to finish the task?

As now I have found the lacking of the knowledge of algorithm makes my codes ineffective, do you have some suggestions for me to learn algorithm better? I think learning by reading Introduction to algorithms is too slow. Is there more effective way to learn?

Upvotes: 1

Views: 92

Answers (2)

volcano
volcano

Reputation: 3582

I do not think you need re for that - just use simple string matching. You also does not need dictionary here.

patternList = (( 'suffix', '.txt'),
               ('keyword', 'system'),
               ('left',  'C:\Windows\System32'))

matchFuncList = []
for pattern, text in patternList:
    if pattern == 'suffix':
        matchFuncList.append(lambda s: s.endswith(text))
    elif pattern == 'keyword':
        matchFuncList.append(lambda s: text in s)
    elif pattern == 'left':
        matchFuncList.append(lambda s: s.startswith(text))

Now, don't remove values from lists - rebuild lists

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.fileList = [path for path in pathInfoObject.fileList 
                               if not any(matchFunc(path) 
                                          for matchFunc in matchFuncList)]

Upvotes: 0

Julien Palard
Julien Palard

Reputation: 11546

It look like it's more like a blacklist than a whitelist, but if I get it wrong, it's easy to fix it.

I tried to express your rules in a clearer and more flexible manner first. I tried to avoid using useless regex too, they probably cost you a lot of time. Finally by using any I avoid testing each exclusion rule when the first has matched. Using a continue in your for loop has the same effect.

exclusion_rules = [
    lambda path: path.endswith('.txt'),
    lambda path: 'system' in path,
    lambda path: path.startswith(r'c:\Windows\System32')]

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.filelist = filter(
        lambda path: not any(rule(path) for rule in exclusion_rules),
        pathInfoObject.filelist)

Another way to do so with list comprehehension instead of filter:

for pathInfoObject in pathInfoObjectList:
    pathInfoObject.filelist = [path for path in pathInfoObject.filelist if
                               not any(rule(path) for rule in exclusion_rules)]

Upvotes: 0

Related Questions