Reputation: 101
a_standard = {
'section1': {
'category1': 1,
'category2': 2
},
'section2': {
'category1': 1,
'category2': 2
}
}
a_new = {
'section1': {
'category1': 1,
'category2': 2
},
'section2': {
'category1': 1,
'category2': 3
}
}
I want to find the difference between a_standard
and a_new
which is in a_new[section2][category2]
difference in value being 2
and 3
Should I convert each to a set and then do difference or loop and compare the dict?
Upvotes: 3
Views: 10508
Reputation: 1083
There is a library called deepdiff that has a lot of options, but I find it to be somewhat unintuitive.
Here's a recursive function that I often use to compute diffs during my unit tests. This goes a bit beyond what the question asks, because I take care of the case of lists being nested as well. I hope you'll find it useful.
Function definition:
from copy import deepcopy
def deep_diff(x, y, parent_key=None, exclude_keys=[], epsilon_keys=[]):
"""
Take the deep diff of JSON-like dictionaries
No warranties when keys, or values are None
"""
EPSILON = 0.5
rho = 1 - EPSILON
if x == y:
return None
if parent_key in epsilon_keys:
xfl, yfl = float_or_None(x), float_or_None(y)
if xfl and yfl and xfl * yfl >= 0 and rho * xfl <= yfl and rho * yfl <= xfl:
return None
if type(x) != type(y) or type(x) not in [list, dict]:
return x, y
if type(x) == dict:
d = {}
for k in x.keys() ^ y.keys():
if k in exclude_keys:
continue
if k in x:
d[k] = (deepcopy(x[k]), None)
else:
d[k] = (None, deepcopy(y[k]))
for k in x.keys() & y.keys():
if k in exclude_keys:
continue
next_d = deep_diff(x[k], y[k], parent_key=k, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys)
if next_d is None:
continue
d[k] = next_d
return d if d else None
# assume a list:
d = [None] * max(len(x), len(y))
flipped = False
if len(x) > len(y):
flipped = True
x, y = y, x
for i, x_val in enumerate(x):
d[i] = deep_diff(y[i], x_val, parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys) if flipped else deep_diff(x_val, y[i], parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys)
for i in range(len(x), len(y)):
d[i] = (y[i], None) if flipped else (None, y[i])
return None if all(map(lambda x: x is None, d)) else d
# We need this helper function as well:
def float_or_None(x):
try:
return float(x)
except ValueError:
return None
Usage:
>>> deep_diff(a_standard, a_new)
{'section2': {'category2': (2, 3)}}
I think the output is a little more intuitive than the other answers.
In unit tests I'll so something like:
import json
diff = deep_diff(expected_out, out, exclude_keys=["flickery1", "flickery2"])
assert diff is None, json.dumps(diff, indent=2)
Upvotes: 4
Reputation: 688
you can do this assuming the keys are the same:
def find_diff(dict1, dict2):
differences = []
for key in dict1.keys():
if type(dict1[key]) is dict:
return find_diff(dict1[key], dict2[key])
else:
if not dict1[key] == dict2[key]:
differences.append((key, dict1[key], dict2[key]))
return differences
I’m typing on my phone right now, so sorry if the syntax is a little messed up.
Upvotes: 2
Reputation: 71451
You can use recursion:
a_standard = {
'section1': {
'category1': 1,
'category2': 2
},
'section2': {
'category1': 1,
'category2': 2
}
}
a_new = {
'section1': {
'category1': 1,
'category2': 2
},
'section2': {
'category1': 1,
'category2': 3
}
}
def differences(a, b, section=None):
return [(c, d, g, section) if all(not isinstance(i, dict) for i in [d, g]) and d != g else None if all(not isinstance(i, dict) for i in [d, g]) and d == g else differences(d, g, c) for [c, d], [h, g] in zip(a.items(), b.items())]
n = filter(None, [i for b in differences(a_standard, a_new) for i in b])
Output:
[('category2', 2, 3, 'section2')]
Which yields the key corresponding to the unequal values.
Edit: without list comprehension:
def differences(a, b, section = None):
for [c, d], [h, g] in zip(a.items(), b.items()):
if not isinstance(d, dict) and not isinstance(g, dict):
if d != g:
yield (c, d, g, section)
else:
for i in differences(d, g, c):
for b in i:
yield b
print(list(differences(a_standard, a_new)))
Output:
['category2', 2, 3, 'section2']
This solution utilizes generators (hence the yield
statement), which store the yielded values on the fly, only remembering where it left off. The values can be garnered by casting the returned result as a list. yield
makes it easier to accumulate the value differences and removes the need to keep an additional parameter in the function or a global variable.
Upvotes: 3