find_all
find_all

Reputation: 197

List intersection for multiple matching instances

I have an algorithm whose functionality is dependent upon the number of matching instances (even repeated) between two lists. For example:

a = ["test", "win", "win", "bike", "bike", "bike", "gem", "nine"]
b = ["test", "win", "let", "bike", "four"]

d = set(a).intersection(b)

Would give me:

{"test", "win", "bike"}

The output I would like, would be:

{"test", "win", "win", "bike", "bike", "bike"}

I figure I could just utilize the output list and do a count for how many times each intersected word exists in list a etc... But this is quite a few extra steps, and I am hoping there is a simpler way to achieve this output.

My question is, based on the provided example lists, how can I achieve the desired second output of:

{"test", "win", "win", "bike", "bike", "bike"}

Upvotes: 1

Views: 72

Answers (2)

blhsing
blhsing

Reputation: 107134

You can use collections.Counter to obtain the counts of each distinct item, and use the Counter.elements method to produce the desired list with items repeated according to counts after filtering them by membership in b (convert b to a set for efficient membership lookups):

from collections import Counter
set_b = set(b)
print(list(Counter({k: c for k, c in Counter(a).items() if k in set_b}).elements()))

This outputs:

['test', 'win', 'win', 'bike', 'bike', 'bike']

Upvotes: 1

lmiguelvargasf
lmiguelvargasf

Reputation: 70003

You should use a list instead of a set, and you can take advantage of the code you have already defined, so:

a = ["test", "win", "win", "bike", "bike", "bike", "gem", "nine"]
b = ["test", "win", "let", "bike", "four"]

s = set(a).intersection(b)

output = []

for e in s:
    n = max(a.count(e), b.count(e))
    output.extend([e] * n)

>>> output
['win', 'win', 'bike', 'bike', 'bike', 'test']

Basically, what you are doing is to repeat the common elements considering in which list it is repeated the most.

Upvotes: 1

Related Questions