Reputation: 1362
I have designed a white list function to filter file pathes in windows. There are three types of patterns to filter:
The patterns are saved in the format:
patternList = [{'type': 'suffix', 'content':'\.txt'},
{'type': 'keyword', 'content':'system'},
{'type': 'left', 'content': 'C:\Windows\System32'}]
every dict is a pattern, and all patterns are in a list called patternList.
Then, I have another list called pathInfoObjectList which contain many objects, each objects has an attribute called "filelist", which is a list. In the filelist, there are some file pathes.
Now, I want to use the pattern to delete every path in filelist.
My method is to change the pattern to regex to finish the work.
My codes is here:
patternRegexList = []
for each in patternList:
if each['type'] == 'suffix':
patternRegex = '.*?' + each['content'] + '$'
elif each['type'] == 'keyword':
patternRegex = '.*?' + each['content'] + '.*?'
elif each['type'] == 'left':
patternRegex = '^' + each['content'] + '.*?'
patternRegexList.append(patternRegex)
for pathInfoObject in pathInfoObjectList:
for path in pathInfoObject.filelist[:]:
for patternRegex in patternRegexList:
if re.match(patternRegex, path):
pathInfoObject.filelist.remove(path)
break
But I think my algorithm is so stupid, and it is .
Do you have a smart way to finish the task?
As now I have found the lacking of the knowledge of algorithm makes my codes ineffective, do you have some suggestions for me to learn algorithm better? I think learning by reading Introduction to algorithms is too slow. Is there more effective way to learn?
Upvotes: 1
Views: 92
Reputation: 3582
I do not think you need re for that - just use simple string matching. You also does not need dictionary here.
patternList = (( 'suffix', '.txt'),
('keyword', 'system'),
('left', 'C:\Windows\System32'))
matchFuncList = []
for pattern, text in patternList:
if pattern == 'suffix':
matchFuncList.append(lambda s: s.endswith(text))
elif pattern == 'keyword':
matchFuncList.append(lambda s: text in s)
elif pattern == 'left':
matchFuncList.append(lambda s: s.startswith(text))
Now, don't remove values from lists - rebuild lists
for pathInfoObject in pathInfoObjectList:
pathInfoObject.fileList = [path for path in pathInfoObject.fileList
if not any(matchFunc(path)
for matchFunc in matchFuncList)]
Upvotes: 0
Reputation: 11546
It look like it's more like a blacklist than a whitelist, but if I get it wrong, it's easy to fix it.
I tried to express your rules in a clearer and more flexible manner first. I tried to avoid using useless regex too, they probably cost you a lot of time. Finally by using any
I avoid testing each exclusion rule when the first has matched. Using a continue
in your for loop has the same effect.
exclusion_rules = [
lambda path: path.endswith('.txt'),
lambda path: 'system' in path,
lambda path: path.startswith(r'c:\Windows\System32')]
for pathInfoObject in pathInfoObjectList:
pathInfoObject.filelist = filter(
lambda path: not any(rule(path) for rule in exclusion_rules),
pathInfoObject.filelist)
Another way to do so with list comprehehension instead of filter:
for pathInfoObject in pathInfoObjectList:
pathInfoObject.filelist = [path for path in pathInfoObject.filelist if
not any(rule(path) for rule in exclusion_rules)]
Upvotes: 0