user8370684
user8370684

Reputation:

How to find set intersection by a custom comparison function?

Let's say I have the following:

src = itertools.chain(*map(lambda t: map(lambda u: ((t[0], ) + os.path.splitext(u)), t[2]), os.walk(src_folder)))
dst = itertools.chain(*map(lambda t: map(lambda u: ((t[0], ) + os.path.splitext(u)), t[2]), os.walk(dst_folder)))

This creates two lists of the format [(folder, base name, ext)] for two directories.

I want to be able to find the files that are common in src and dst. I can do this with set(src) & set(dst) as documented. But, what if I want to do it only by the folder and base name, and not by the extension? In other words, what if I want to do set intersection by a custom rule/function? How do I go about doing this?

Upvotes: 2

Views: 1407

Answers (1)

abarnert
abarnert

Reputation: 365807

In other words, what if I want to do set intersection by a custom rule/function? How do I go about doing this?

You can't. The whole reason set intersection is so fast and simple is that Python can immediately check whether a value is an element of a set, without having to compare it to all of the elements of the set.

But what you can do is transform the sets as you build them, then intersect those:

{os.path.basename(path) for path in src} & {os.path.basename(path) for path in dst}

The problem is, this doesn't give you the full names whose basenames are in the intersection, it only gives you the basenames that are in the intersection. How can you fix that?

The easiest solution is to use a dict instead of a set. You can then use its keys view as a set, and then go back and get the corresponding values:

srcmap = {os.path.basename(path): path for path in src}
srcisect = srcmap.keys() & {os.path.basename(path) for path in dst}
result = {srcmap[key] for key in srcisect}

This may look like a lot more work, but it's really just 4 linear passes instead of 3 (and the extra one is just over the intersection rather than one of the original lists), so at worst the performance will only be worse by a small constant factor.

Upvotes: 2

Related Questions