Find values in a dictionary (with array of values) with duplicate keys

Question

I have a dictionary like this

{1: array([1, 5, 7, 2, 8,  3,  4],
2: array([1, 10, 11, 12, 13, 8, 15])
3: array([10,20, 21, 22, 23, 24, 25])
4: array([7, 30, 31, 32, 33, 34, 35])
}

How can I find keys with duplicate values?

For example, the output is like this:

1 has appeared in keys 1 and 2

10 in keys 2 and 3

John Coleman · Accepted Answer

If I understand the question, the following works:

from numpy import array
from collections import defaultdict

d = {1: array([1, 5, 7, 2, 8,  3,  4]),
2: array([1, 10, 11, 12, 13, 8, 15]),
3: array([10,20, 21, 22, 23, 24, 25]),
4: array([7, 30, 31, 32, 33, 34, 35])
}

d2 = defaultdict(list)
for k,v in d.items():
    for x in v:
        d2[x].append(k)

d2 = {k:v for k,v in d2.items() if len(v) > 1}  
print(d2)
#prints {1: [1, 2], 7: [1, 4], 8: [1, 2], 10: [2, 3]}

Note that if the dictionary comes from a pandas dataframe then there is probably a pandas approach which is much more direct. Finding duplicate values in a dataset is such a basic operation that it would be surprising if the functionality isn't already somewhere in the library.

Find values in a dictionary (with array of values) with duplicate keys

Answers (1)

Related Questions