Reputation: 123
So, I have 2 dictionaries, I have to check for missing keys and for matching keys, check if they have same or different values.
dict1 = {..}
dict2 = {..}
#key values in a list that are missing in each
missing_in_dict1_but_in_dict2 = []
missing_in_dict2_but_in_dict1 = []
#key values in a list that are mismatched between the 2 dictionaries
mismatch = []
What's the most efficient way to do this?
Upvotes: 4
Views: 17539
Reputation: 1122082
You can use dictionary view objects, which act as sets. Subtract sets to get the difference:
missing_in_dict1_but_in_dict2 = dict2.keys() - dict1
missing_in_dict2_but_in_dict1 = dict1.keys() - dict2
For the keys that are the same, use the intersection, with the &
operator:
mismatch = {key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]}
If you are still using Python 2, use dict.viewkeys()
.
Using dictionary views to produce intersections and differences is very efficient, the view objects themselves are very lightweight the algorithms to create the new sets from the set operations can make direct use of the O(1) lookup behaviour of the underlying dictionaries.
Demo:
>>> dict1 = {'foo': 42, 'bar': 81}
>>> dict2 = {'bar': 117, 'spam': 'ham'}
>>> dict2.keys() - dict1
{'spam'}
>>> dict1.keys() - dict2
{'foo'}
>>> [key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]]
{'bar'}
and a performance comparison with creating separate set()
objects:
>>> import timeit
>>> import random
>>> def difference_views(d1, d2):
... missing1 = d2.keys() - d1
... missing2 = d1.keys() - d2
... mismatch = {k for k in d1.keys() & d2 if d1[k] != d2[k]}
... return missing1, missing2, mismatch
...
>>> def difference_sets(d1, d2):
... missing1 = set(d2) - set(d1)
... missing2 = set(d1) - set(d2)
... mismatch = {k for k in set(d1) & set(d2) if d1[k] != d2[k]}
... return missing1, missing2, mismatch
...
>>> testd1 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> testd2 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_views as d', number=1000)
1.8643521590274759
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_sets as d', number=1000)
2.811345119960606
Using set()
objects is slower, especially when your input dictionaries get larger.
Upvotes: 10
Reputation: 95652
One easy way is to create sets from the dict
keys and subtract them:
>>> dict1 = { 'a': 1, 'b': 1 }
>>> dict2 = { 'b': 1, 'c': 1 }
>>> missing_in_dict1_but_in_dict2 = set(dict2) - set(dict1)
>>> missing_in_dict1_but_in_dict2
set(['c'])
>>> missing_in_dict2_but_in_dict1 = set(dict1) - set(dict2)
>>> missing_in_dict2_but_in_dict1
set(['a'])
Or you can avoid casting the second dict
to a set
by using .difference()
:
>>> set(dict1).difference(dict2)
set(['a'])
>>> set(dict2).difference(dict1)
set(['c'])
Upvotes: 2