jo2083248
jo2083248

Reputation: 123

How to find all differences between two dictionaries efficiently in python

So, I have 2 dictionaries, I have to check for missing keys and for matching keys, check if they have same or different values.

dict1 = {..}
dict2 = {..}
#key values in a list that are missing in each
missing_in_dict1_but_in_dict2 = []
missing_in_dict2_but_in_dict1 = []
#key values in a list that are mismatched between the 2 dictionaries
mismatch = []

What's the most efficient way to do this?

Upvotes: 4

Views: 17539

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1122082

You can use dictionary view objects, which act as sets. Subtract sets to get the difference:

missing_in_dict1_but_in_dict2 = dict2.keys() - dict1
missing_in_dict2_but_in_dict1 = dict1.keys() - dict2

For the keys that are the same, use the intersection, with the & operator:

mismatch = {key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]}

If you are still using Python 2, use dict.viewkeys().

Using dictionary views to produce intersections and differences is very efficient, the view objects themselves are very lightweight the algorithms to create the new sets from the set operations can make direct use of the O(1) lookup behaviour of the underlying dictionaries.

Demo:

>>> dict1 = {'foo': 42, 'bar': 81}
>>> dict2 = {'bar': 117, 'spam': 'ham'}
>>> dict2.keys() - dict1
{'spam'}
>>> dict1.keys() - dict2
{'foo'}
>>> [key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]]
{'bar'}

and a performance comparison with creating separate set() objects:

>>> import timeit
>>> import random
>>> def difference_views(d1, d2):
...     missing1 = d2.keys() - d1
...     missing2 = d1.keys() - d2
...     mismatch = {k for k in d1.keys() & d2 if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> def difference_sets(d1, d2):
...     missing1 = set(d2) - set(d1)
...     missing2 = set(d1) - set(d2)
...     mismatch = {k for k in set(d1) & set(d2) if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> testd1 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> testd2 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_views as d', number=1000)
1.8643521590274759
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_sets as d', number=1000)
2.811345119960606

Using set() objects is slower, especially when your input dictionaries get larger.

Upvotes: 10

Duncan
Duncan

Reputation: 95652

One easy way is to create sets from the dict keys and subtract them:

>>> dict1 = { 'a': 1, 'b': 1 }
>>> dict2 = { 'b': 1, 'c': 1 }
>>> missing_in_dict1_but_in_dict2 = set(dict2) - set(dict1)
>>> missing_in_dict1_but_in_dict2
set(['c'])
>>> missing_in_dict2_but_in_dict1 = set(dict1) - set(dict2)
>>> missing_in_dict2_but_in_dict1
set(['a'])

Or you can avoid casting the second dict to a set by using .difference():

>>> set(dict1).difference(dict2)
set(['a'])
>>> set(dict2).difference(dict1)
set(['c'])

Upvotes: 2

Related Questions