Reputation: 902
I have a set of filenames coming from two different directories.
currList=set(['pathA/file1', 'pathA/file2', 'pathB/file3', etc.])
My code is processing the files, and need to change currList by comparing it to its content at the former iteration, say processLst. For that, I compute a symmetric difference:
toProcess=set(currList).symmetric_difference(set(processList))
Actually, I need the symmetric_difference to operate on the basename (file1...) not on the complete filename (pathA/file1).
I guess I need to reimplement the __eq__
operator, but I have no clue how to do that in python.
__eq__
the right approach?
orUpvotes: 1
Views: 230
Reputation: 37279
Here is a token (and likely poorly constructed) itertools
version that should run a little bit faster if speed ever becomes a concern (although agree that @Zarkonnen's one-liner is pretty sweet, so +1 there :) ).
from itertools import ifilter
currList = set(['pathA/file1', 'pathA/file2', 'pathB/file3'])
processList=set(['pathA/file1', 'pathA/file9', 'pathA/file3'])
# This can also be a lambda inside the map functions - the speed stays the same
def FileName(f):
return f.split('/')[-1]
# diff will be a set of filenames with no path that will be checked during
# the ifilter process
curr = map(FileName, list(currList))
process = map(FileName, list(processList))
diff = set(curr).symmetric_difference(set(process))
# This filters out any elements from the symmetric difference of the two sets
# where the filename is not in the diff set
results = set(ifilter(lambda x: x.split('/')[-1] in diff,
currList.symmetric_difference(processList)))
Upvotes: 2
Reputation: 22478
You can do this with the magic of generator expressions.
def basename(x):
return x.split("/")[-1]
result = set(x for x in set(currList).union(set(processList)) if (basename(x) in [basename(y) for y in currList]) != (basename(x) in [basename(y) for y in processList]))
should do the trick. It gives you all the elements X that appear in one list or the other, and whose basename-presence in the two lists is not the same.
Edit: Running this with:
currList=set(['pathA/file1', 'pathA/file2', 'pathB/file3'])
processList=set(['pathA/file1', 'pathA/file9', 'pathA/file3'])
returns:
set(['pathA/file2', 'pathA/file9'])
which would appear to be correct.
Upvotes: 1