Bruno von Paris
Bruno von Paris

Reputation: 902

re-implement __eq__ to compare sets with symmetric_difference in python

I have a set of filenames coming from two different directories.

currList=set(['pathA/file1', 'pathA/file2', 'pathB/file3', etc.])

My code is processing the files, and need to change currList by comparing it to its content at the former iteration, say processLst. For that, I compute a symmetric difference:

toProcess=set(currList).symmetric_difference(set(processList))

Actually, I need the symmetric_difference to operate on the basename (file1...) not on the complete filename (pathA/file1).

I guess I need to reimplement the __eq__ operator, but I have no clue how to do that in python.

  1. is reimplementing __eq__ the right approach? or
  2. is there another better/equivalent approach?

Upvotes: 1

Views: 230

Answers (2)

RocketDonkey
RocketDonkey

Reputation: 37279

Here is a token (and likely poorly constructed) itertools version that should run a little bit faster if speed ever becomes a concern (although agree that @Zarkonnen's one-liner is pretty sweet, so +1 there :) ).

from itertools import ifilter

currList = set(['pathA/file1', 'pathA/file2', 'pathB/file3'])
processList=set(['pathA/file1', 'pathA/file9', 'pathA/file3'])

# This can also be a lambda inside the map functions - the speed stays the same
def FileName(f):
  return f.split('/')[-1]

# diff will be a set of filenames with no path that will be checked during
# the ifilter process
curr = map(FileName, list(currList))
process = map(FileName, list(processList))
diff = set(curr).symmetric_difference(set(process))

# This filters out any elements from the symmetric difference of the two sets
# where the filename is not in the diff set
results = set(ifilter(lambda x: x.split('/')[-1] in diff,
              currList.symmetric_difference(processList)))

Upvotes: 2

Zarkonnen
Zarkonnen

Reputation: 22478

You can do this with the magic of generator expressions.

def basename(x):
    return x.split("/")[-1]

result = set(x for x in set(currList).union(set(processList)) if (basename(x) in [basename(y) for y in currList]) != (basename(x) in [basename(y) for y in processList]))

should do the trick. It gives you all the elements X that appear in one list or the other, and whose basename-presence in the two lists is not the same.

Edit: Running this with:

currList=set(['pathA/file1', 'pathA/file2', 'pathB/file3'])
processList=set(['pathA/file1', 'pathA/file9', 'pathA/file3'])

returns:

set(['pathA/file2', 'pathA/file9'])

which would appear to be correct.

Upvotes: 1

Related Questions